Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012,...
Transcript of Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012,...
![Page 1: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/1.jpg)
Recognition of phishing attacks utilizing anomalies in phishing websites
Sunil Chaudhary
University of Tampere Department of Computer Sciences Computer Science/Software Development M.Sc. thesis Supervisor: Eleni Berki November 2012
![Page 2: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/2.jpg)
i
University of Tampere Department of Computer Sciences Computer Science /Software Development Sunil Chaudhary: Recognition of phishing attacks utilizing anomalies in phishing websites M.Sc. Thesis, 78 pages, 15 index and appendix pages November 2012
The fight against phishing has resulted in several anticipating phishing prevention
techniques. However, they are only partially able to address the phishing problem.
There are still a large number of Internet users who are tricked to disclose their personal
information to fake websites every day. This might be because existing phishing
prevention techniques are either not foolproof or they are unable to deal with the
emerging changes in phishing.
The main purpose of this thesis is to identify anomalies that can be found in the
Uniform Resource Locators (URLs) and source codes of phishing websites and
determine an efficient way to employ those anomalies for phishing detection. In order to
do that, I performed the meta-analysis of several existing phishing prevention
techniques, specifically heuristic methods. Then, I selected forty-one anomalies, which
can be found in the URLs and sources codes of phishing websites and are also
mentioned or utilized by the past studies. This is followed by the verification of those
anomalies using an experiment conducted on twenty online phishing websites. The
study revealed that some anomalies, which were once significant for phishing detection,
are no longer included in present day phishing websites, and several anomalies are also
widely present in legitimate websites. Such ambiguous anomalies need further analysis
to determine their significance in phishing detection. Moreover, it was also found that
several heuristic methods use an insufficient set of anomalies which introduces
inaccuracy in their results. Finally, in order to design an efficient heuristic method
employing anomalies that can be found in URLs and source codes of phishing websites,
it is suggested to give due priority to the anomalies that are: difficult for phishers to
bypass, only found in phishing websites, seriously harmful, independent of other
anomalies, and do not consume a lot of time for evaluation.
Key words and terms: phishing, phishing prevention, URL, DOM objects, whitelist, blacklist, heuristics, meta-analysis, software quality.
![Page 3: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/3.jpg)
ii
Acknowledgement I would like to express my sincere thanks and deep appreciation to my professor and
supervisor Eleni Berki for her guidance and valuable comments. I am equally thankful
to Marko Helenius (Tampere University of Technology) for the constructive feedback. I
would also like to thank Linfeng Li for sharing his experiences on phishing research and
suggesting various useful materials that I used for my thesis.
My sincere thanks also go to my English teachers, Robert Hollingsworth and
Julie Rajala who helped me to get familiar with the rules of academic writing. I would
also like to thank to my professors Jyrki Nummenmaa and Zheying Zhang as well as all
the attendee of the seminar course entitled “Master’s Thesis Seminar in Sofware
Development “for their suggestion and feedback. Last but not least, I am thankful to my
professor Mikko Ruohonen who provided me summer traineeship and ample freedom to
complete a large part of my thesis during the traineeship period.
Sunil Chaudhary
2nd December 2012, Tampere
![Page 4: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/4.jpg)
iii
Contents 1.Introduction ................................................................................................................... 1
1.1.The phishing epidemic ........................................................................................ 1
1.2.Research questions ............................................................................................. 5
1.3.Anomalies in phishing websites are suitable for phishing detection ................... 6 1.4.Thesis contribution .............................................................................................. 7
1.5.Thesis outline............... ....................................................................................... 8
2.Review of phishing prevention methods ....................................................................... 8
2.1.Meaning of phishing prevention methods ........................................................... 8
2.2.Important factors for effective phishing prevention methods .............................. 9
2.2.1. Phishers’ behavior and phishing techniques ....................................... 10
2.2.2.Internet users behavior and decision making process .......................... 12
2.3.Objectives of existing phishing prevention methods ......................................... 14
2.3.1.Reasons behind internet users’ tendency to fall for phishing ............... 15
2.3.2.Design techniques to educate and aware about phishing ..................... 16
2.3.3.Design effective UI and warning to alert about phishing ................... 18
2.3.4.Development of countermeasure to automatically detect phishing ...... 20
2.3.5.Evaluate the effectiveness of existing phishing prevention methods ... 22
2.3.6.The need to invent proactive strategies for phishing prevention .......... 24
2.4.Classification of phishing prevention techniques .............................................. 28
2.5.Phishing prevention applications ....................................................................... 30 3.Analysis of strength and limitations of technical phishing prevention methods ......... 34
3.1.List based methods ............................................................................................ 34
3.1.1.Whitelist method .................................................................................. 34
3.1.2.Blacklist method ................................................................................... 36
3.2.Heuristic methods .............................................................................................. 40
3.2.1.Use of visual similarity measures in phishing detection ...................... 40
3.2.2.Use of search engine in phishing detection .......................................... 46
3.2.3.Use of anomalies in phishing websites for phishing detection ............ 50
4.Investigating anomalies in phishing websites .............................................................. 55
4.1.Anomalies found in the URLs of phishing websites ......................................... 56
4.2.Anomalies found in the source codes of phishing websites .............................. 62
4.3.Verification of anomalies using online phishing websites ................................ 66
4.4. Discussion on findings ..................................................................................... 70
5.Conclusions ................................................................................................................. 75 6.Limitations and future development work ................................................................... 78 References ...................................................................................................................... 79
Appendix ........................................................................................................................ 86
![Page 5: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/5.jpg)
1
1. Introduction
1.1. The phishing epidemic
Online services are an integral part of modern society. They make information readily
accessible from any place through the Internet. This feature is equally utilized by both
service providers and users. Service providers are able to penetrate and cover large
markets easily at a low operational cost whilst users are able to choose from a wide
range of services and are able to use them regardless of time and location.
Unfortunately, these services too have not spared the attentions of cybercriminal. One of
the major drawbacks of using such services is the risk of phishing.
Phishing is a fraudulent activity carried out using an electronic communication to
acquire personal information for malicious purposes. This information can include bank
or financial institution authentication credentials, social security numbers, credit card
details, and online shopping account information with which phishers usually defraud
their victims. Phishers employ a number of techniques, such as social engineering
scheme and technical subterfuge [APGW, 2012] in order to allure potential victims and
make them divulge their account details and other susceptible information.
(i) Social engineering scheme. In general, phishers use emails masquerading as
being from a legitimate and trustworthy source, such as a bank, or an auction
site, or an online commerce site [APGW, 2012] and redirect victims to an
authentic looking counterfeit website to deceive recipients into disclosing
sensitive information. Many other mediums, such as snail mail, phone call, and
instant messenger are also used to reach the potential victims and lure them to
disclose their confidential information. However, fake emails and phony
websites are easy and economically viable means to target a large number of
potential victims at a time which also might be a reason they are widely used to
conduct phishing.
The fake emails and phony websites used by phishers have evolved to
become technically deceptive and hard for casual detection methods to detect.
Phishing emails often create a sense of urgency to motivate Internet users to take
prompt action, such as asking potential victims to update, or validate, or confirm
their account information for different reasons, for example, to receive an award,
or to help the bank in their procedures, otherwise their account will be
![Page 6: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/6.jpg)
2
suspended, or to stop the account from misuse. Similarly, phishers are also
found misusing the situations and current happenings, for example, phishing
attacks, which emerged after the Haiti earthquake purported to be from relief
organizations or the victims themselves asking for donations, and FIFA World
Cup-themed phishing attempts. Phishing websites often use original website
layout, logo, trademark, and even a similar domain name to make them look
similar to the genuine websites. Furthermore, mirroring original websites to
generate fake websites makes it harder to differentiate them even for people with
adequate knowledge about phishing. It has also been reported that some
phishing websites claim to sell products, such as software, games, and sex pills
at high discount, and then steal the bank information when the Internet user
enters it into their websites to buy the products.
(ii) Technical subterfuge. Phishers plant crimeware onto Personal Computers (PCs)
of potential victims to steal their credentials directly [APGW, 2012]. Many hackers
have been involved with phishing and use advanced hacking techniques. Some of
the mechanisms used in technical subterfuge are:
Session hijacking is used, often by corrupting the local navigation
infrastructure to misdirect potential victims to a fake website or an authentic
website through proxies controlled by the phishers. Techniques, such as
pharming, cross-site scripting attack, cross-site request forgery, domain
name typos, and man-in-the-middle attacks are implemented to carry out
session hijacking [Milletary, 2006].
An uncontrolled flood of spam emails are sent with malware in the
attachment or with a link on clicking which surreptitiously installs
specialized malware in the Internet users’ computers. Such malware is
designed to monitor and intercept the victims’ keystrokes and mouse clicks.
Sometimes malware is designed even to capture the screenshot of webpage
visited by the victims and ultimately post captured information to the
phishers. More advanced malware designed to capture network packets and
protocol information, and password harvester that looks for username and
password information in the victims’ computer are also found to be
employed by phishers.
![Page 7: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/7.jpg)
3
There has been a rapid increase in phishing attacks from the first half of 2008 to
the first half of 2012, shown in Figure 1.
Figure 1: Phishing websites detected from 2008 to 2012
Many factors are responsible for the growth of phishing. One of the major factors is the
unawareness of Internet users that their personal information is actively targeted by
criminals and as a consequence they neglect to take precautionary measures while
performing online transactions. Likewise, many online service users lack organizational
policies and procedures for contacting customers [Dhamija et al., 2006], although
presently many big organizations that are phishing prone do seem to have acted to
improve the situation. Moreover, phishing is a very lucrative cybercrime with a high
benefit return against little risk. The exponential growth in the use of the Internet and
online services has resulted in a rapid increase in potential victims encouraging many
new criminals and inspiring them to use different new sophisticated techniques to
deceive Internet users more effectively. Additionally, the fact that the technical
resources required for phishing are easily accessible. It has enabled even a criminal with
a little technical knowledge to conduct phishing successfully. Many do-it yourself
phishing kits are available online which can be downloaded for free. These kits also
contain software for spamming that enable phishers to easily reach large numbers of
potential victims. There are various websites available online that offer the guidance for
designing and conducting phishing.
![Page 8: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/8.jpg)
4
Phishing is a leading cause of identity theft on the Internet and causes billions of
dollars of damage worldwide every year. It has an adverse impact on the economy
through direct and indirect losses experienced by businesses and customers.
The direct loss is the financial damage incurred of the amount that phishers
withdraw from their victims’ accounts.
The indirect losses are an adverse impact on customers’ confidence towards
online commerce and services, the diminished reputation of victimized
organizations, and the resources spent to combat phishing.
Moreover, the convenience of e-commerce seems to be embraced by both
cybercriminals and users on an equal basis. Financial services are the most targeted
industries by phishing, shown in Figure 2.
Figure 2: Targeted industry sectors by phishing [APWG, 2012]
With the prevalence of phishing attacks and the increasing vulnerability of users’
confidential and personal information, it is increasingly important to provide Internet
users with an effective and reliable phishing prevention method.
There is no silver bullet to eliminate the problem of phishing. It depends partially
on well designed technology and equally on the browsing habits of Internet users. Well
designed technology includes techniques efficiently able to tackle successful phishing
techniques and a usable design that take into consideration what humans can and cannot
do well [Dhamija et al., 2006]. Li et al. [2007] emphasize improving the quality of
system design and the need for well-defined security requirements to prevent system
users from phishing. The browsing habit means that Internet users are familiar with
phishing and are able to detect them. It includes the trust towards anti-phishing software
![Page 9: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/9.jpg)
5
which Internet users have installed in their system and their reaction to the warning
from the anti-phishing system installed. However, an empirical study has shown that
many of the Internet users neglect warnings from the anti-phishing system [Dhamija et
al., 2006]. Many Internet users do not understand phishing attacks or do not understand
the sophistication of phishing [Wu et al., 2006b].
There are several promising solutions provided by security experts and researchers
against phishing. These systems build an awareness of potential phishing attempts, and
develop and promote suitable technology solutions that help to protect Internet users
against phishing. They implement prevention, detection, and response measures. They
are available in a variety of forms: integrated with popular anti-virus systems, e.g., anti-
phishing tool in Norton antivirus software, as an embedded feature of renowned web
browsers, e.g., Google Safe Browsing toolbar [Google Safe Browsing] used in Mozilla
Firefox browser, and as separate tools and add-ons that can be used in server and client
machines, e.g., eBay toolbar [eBay Toolbar’s Account Guard]. They employ different
techniques, such as blacklist, e.g., Netcraft Anti-phishing toolbar [Netcraft] , whitelist,
e.g., SmartScreen Filter [MSDN IEBlog], content based detection, e.g., CANTINA
[Zhang et al., 2007a], analysis of source web page source code or URL, e.g.,
CANTINA+ [Xiang et al., 2011] , comparing visual similarity of the whole webpage or
layout or logo, e.g., online tool called “SiteWatcher Anti-phishing Tech” [Liu et al.
,2006], analysis of data submitted by users online, e.g., SpoofGuard[Chou et al. ,2004] ,
and use of a reputable search engine, e.g., CANTINA [Zhang et al., 2007a]. There has
been a good progress in identifying countermeasures; however, there has also been an
increase in attack diversity and technical sophistication to circumvent both detection
and users’ suspicions too. This means as countermeasures are implemented to thwart
one method of stealing information, criminals search for new vulnerabilities to be
exploited. This also means they always have additional opportunities available to them.
1.2. Research questions
The most common and straight forward technique to commit phishing attacks is to
deploy a webpage that mimics the look and feel of a target organization’s website.
There are several heuristic methods which employ anomalies in the URLs and source
codes in order to identify phishing websites. Many anti-phishing tools in use, such as
SpoofGuard [Chou et al., 2004], Netcraft Anti-phishing toolbar [Netcraft], CallingID
toolbar [CallingID], eBay toolbar [eBay Toolbar’s Account Guard], and SmartScreen
![Page 10: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/10.jpg)
6
Filter [MSDN IEBlog] also implement heuristic methods for phishing detection.
However, there are several anti-phishing tools, such as Cloudmark Anti-fraud toolbar
[Cloudmark] and EarthLink toolbar [EarthLink] that still rely on manual verifications
and blacklists for phishing detection [Zhang et al., 2007b]. Ironically, even anti-
phishing tools, such as eBay Toolbar and SmartScreen Filter that use heuristic methods
do not use them in the first place [MSDN IEBlog, eBay Toolbar’s Account Guard],
since heuristic methods introduce higher inaccuracy in the results compare to list based
methods. Therefore, it requires further research that can improve the heuristic methods’
results. In this thesis, I have worked on answering following two questions:
(i) What are the most common anomalies found in URLs and source code of
phishing websites?
(ii) How could these anomalies be deployed in order to recognize phishing
attempts?
I believe in order to enhance the accuracy of results in heuristic methods, the two crucial
factors are: selection of suitable anomalies and designing suitable method to employ
them.
Some of the related studies that use anomalies in the source codes and URLs of
phishing websites for phishing detection include Pan and Ding [2006], who looked for
anomalies in webpage and cookies for phishing detection; Gasteller-Prevost et al.
[2011], who evaluated URL and webpage source code; Garera et al. [2007], who
analyzed the features of URL for discrepancies; and Alkhozae and Batarfi, [2011] who
looked for abnormalities in webpage with respect to the W3C standard. However,
Garera et al. [2007] excluded the source code of the website despite the fact that
important anomalies can also be found in the source code of phishing websites, whilst
all the other studies seems to neglect some vital factors during selection, calibration,
and deployment of the anomalies which is testified by high inaccuracy in their results.
In addition, the studies were performed some years ago, but the trend in phishing is very
dynamic. There is a high chance that anomalies that were important during their studies
may no longer be valid. Many other related researches are analyzed in chapter three.
1.3. Anomalies in phishing websites are suitable for phishing detection
Although phishing sites are cheap and easy to build [Pan and Ding, 2006], these cheaply
made websites are often poorly designed and coded, and do not properly meet
recognized standards, for example, the recommendations from the World Wide Web
![Page 11: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/11.jpg)
7
Consortium (W3C) [Alkhozae and Batarfi, 2011] and the Google guidelines [Garera et
al., 2007]. Their quality score in the Google crawl database was found to be either very
low or they had no score [Gastellier-Prevost et al., 2011]. Moreover, phishing websites
have a very short lifetime and on average a phishing website domain remains online for
3 days, 31 minutes and 8 seconds [McGrath and Gupta, 2008]. For this short duration
naturally phishers do not prefer to concentrate on website design and quality
improvement, but rather to work on more beneficial activities, such as pushing more
emails and websites to potential victims, infecting users’ PCs with malign software to
use them as proxies, and designing distributed architecture that includes registering
many domains from various registrars in order to direct traffic to one of their domains
when any of their domains were removed. In addition, phishing websites often imitate
some genuine websites and they claim false identities which cannot be possible unless
some anomalies are introduced. Therefore, these anomalies can be utilized to detect
phishing. The other benefits of using such anomalies found in the URLs and DOM
objects of websites for a phishing detection method are:
(i) It is not dependent on any specific phishing strategy and is equally valid for
all kinds of phishing websites.
(ii) It does not depend on any external factors ,such as databases, and
(iii) It does not require any changes in user browsing habits.
1.4. Thesis contribution
In order to determine anomalies that are found in the URLs and source codes of
phishing websites, I performed meta-analysis of several past studies related to phishing
prevention, specifically heuristic methods. Then, I selected forty-one anomalies that can
be found in the URLs and source code of phishing websites. After that, I performed an
experiment conducted on twenty online phishing websites to verify those anomalies and
determine their significances in phishing detection. Finally, I suggest the ways by
which anomalies in the URLs and source code of phishing websites can be effectively
utilized during phishing detection. In general, the thesis makes the following
contributions:
(i) A systematic classification of phishing prevention techniques and
applications.
(ii) The meta-analysis of phishing prevention methods.
![Page 12: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/12.jpg)
8
(iii) A set of forty-one anomalies that can be found in the URLs and source code
of phishing websites.
(iv) Results from an experiment conducted on twenty online phishing websites in
order to verify the significances of anomalies in phishing detection.
(v) Necessary guidelines to help in deployment of anomalies for phishing
prevention methods.
1.5. Thesis outline
The thesis proceeds as follows: chapter second reviews phishing prevention techniques
and also includes a systematic classification of phishing prevention techniques and
applications. Chapter third includes the meta-analysis of list based methods and
heuristic methods along with various related studies on them with their main contents,
specialities, and limitations. Chapter four lists out anomalies found in the URLs and
source codes of phishing websites which can be employed for phishing detection, an
explanation of the experiment setup conducted on twenty online phishing websites, the
results obtained from the experiment, and a discussion on the findings. Chapter five
presents conclusions and the last chapter, i.e., chapter six includes the limitations of this
research and some future research and development work.
2. Review of phishing prevention methods
2.1. Meaning of phishing prevention methods
Phishing utilizes the union of technology and social engineering. Social engineering is
about the exploitation of human vulnerabilities [Odaro and Sanders, 2011]. There are
various limitations which arise from human behaviour and decision making process
(e.g., greed and fear affect decision), and social norms (e.g., ethical, legality) which,
unfortunately, so far do not have an exact technical solution valid for all scenario. In
order to overcome those limitations, it requires Internet users’ intelligences to correctly
make the security critical decisions. However, phishers use social engineering and
technology in a strategic manner to distract their potential victims [Jakobsson, 2005].
Therefore, phishing prevention techniques target both components (i.e., technology and
social engineering) related to phishing. Precisely, a phishing prevention technique is any
technical or non-technical solution designed to either stop sensitive information from
leaking to counterfeit website or make leaked data useless [Cao et al., 2008].
![Page 13: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/13.jpg)
9
In order to address the problem of phishing, the American Bankers Association
[2005] recommends developing a comprehensive set of procedures that perform:
(i) Detection. Detection means to keep a vigilant eye on phishing and discover
when any new phishing activity occurs before it can victimize Internet users. It
also includes a solution that extracts information about the phishing website.
(ii) Prevention. Prevention means to help in reducing the frequency of phishing
attempts that Internet users receive or educate Internet users so that they are less
likely to respond to phishing attempts.
(iii) Response. Response means to focus on the precaution and action which
have to be taken after the detection of phishing. It is also related to information
flow about the culprit website and process of removing the phishing websites.
Even though it is recommended for banking sector, it is valid for curbing all other
kinds of phishing as well. The three procedures are shown in Figure 3.
Figure 3: Phishing prevention procedures [American Bankers Association, 2005]
2.2. Important factors for effective phishing prevention methods In order to prevent phishing attacks, it is vital to comprehend phishers’ behaviour and
phishing techniques along with Internet users’ behaviour and their decision making
process. An analysis of phishing behaviours and techniques provide idea and knowledge
about technical and social engineering techniques applied for phishing. Likewise,
Internet users’ behaviour and their decision making process put light on aspects that
Internet users are good at doing and their vulnerabilities. A detected phishing attempt
does not make sense when Internet users cannot either notice or ignore the warnings
from a phishing prevention system. Therefore, Internet users’ response limitations
should be respected. These should further be facilitated with suitable usability.
![Page 14: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/14.jpg)
10
2.2.1. Phishers’ behavior and phishing techniques
Computer security attacks are of three kinds:
(i) Physical attacks. It targets physical infrastructure and network to cause physical
outages, such as break the power or data transmission cable.
(ii) Syntactic attacks. It targets vulnerabilities and loopholes in software, such as
problems in cryptographic algorithms and protocols.
(iii) Semantic attacks. It targets people behaviour and the way they interact with
computer and web, such as the use of social engineering to manipulate Internet
users and steal their information.
This means that phishing includes both syntactic and semantic attacks [Downs et al.,
2006]. This also implies that a phishing prevention system should prevent Internet users
from both syntactic and semantic attacks.
According to Singh [2007], the schemes used by phishers can roughly be classified
into following four kinds:
(i) Dragnet method. It uses spammed emails, bearing the falsified corporate
identification websites or pop-up windows.
(ii) Rod-and-Reel method. It targets specific prospective victims with whom initial
contact is already made, and sends false information to prompt their disclosure
of personal or financial data.
(iii) Lobsterpot method. It consists of creating a forgery website that imitates a
legitimate website so that the victims mistake the spoofed website as a
legitimate one and provide the information of personal data.
(iv) Gillnet phishing. It uses malicious code which infects user’s system with a
Trojan horse or changes the settings of user’s system. Consequently, the Internet
user is directed to a phishing website when tries to visit a legitimate website or
record the keystrokes of user’s personal information and transmit those data to
phishers.
In all these techniques, the phishing schemes seem to typically rely on three basic
elements:
Phishing solicitations often use familiar corporate trademarks and trade names, as
well as recognized security agency names and logos. This can be seen from
Figure 4; it is a phishing website for “Paypal” that also uses “Verisign” logo.
![Page 15: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/15.jpg)
11
Figure 4: A phishing website for Paypal
The solicitations routinely contain warnings or information about award,
lottery or other similar messages intended to cause the recipients immediate
concern or worry about access to an existing financial account. An example
of phishing email informing about a grant can be seen in Figure 5.
Figure 5: A phishing email
The solicitations rely on two facts pertaining to authentication of the e-
mails:
1. Online consumers often lack the tools and technical knowledge to
authenticate messages, especially from financial institutions and e-
commerce companies; and
![Page 16: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/16.jpg)
12
2. Most of the available tools and techniques are inadequate for robust
authentication or can easily be spoofed [Wu et al., 2006b].
In fact, they are the elements against which the existing anti-phishing techniques work
and also the future researches on phishing prevention techniques will work. There are
several heuristic methods that use logo comparison, look for the misuse of security
agency logo, and other properties to detect phishing. Heuristic methods are discussed in
later chapters. Then, there are various spam and phishing emails’ filters in use to protect
against phishing attacks. Some of the e-commerce organizations have their own toolbars
designed for their customers, e.g., eBay’s toolbars that can alert their clients about
phishing targeted to eBay [eBay Toolbar’s Account Guard].
2.2.2. Internet users behavior and decision making process
Human behaviour makes decision making process a very complex procedure. The
outcome depends on probability. It is affected by various factors, such as beliefs,
preferences, past experiences, subjected situations, current states, and others. Further
studies can:
Improve the understanding of factors that make Internet users to fall for phishing,
and
Guide security experts to design countermeasure which can effectively protect
Internet users from phishing.
There has been little work done related to Internet users behaviour and decision making
process in the context of phishing. There is, however, work related to human behaviour
and decision making process in other research contexts. Only a few security scientists
have contributed to human behaviour and decision making process with respect to
phishing. Dhamija et al. [2006] experiment on why people fall for phishing is an
example of such work. This study focused on finding limitations of existing phishing
prevention techniques. Their study revealed that Internet users have their own
preferences of characteristics for identifying phishing, and their decision making
process is affected by various factors, such as their past experiences with phishing,
subjected situation (i.e., a person desperately looking to buy FIFA world cup ticket
reaction towards FIFA world cup-themed phishing will be different than a person who
has not thought about watching FIFA world cup). For instance, in this experiment
participants were asked to identify phishing which affected their decision, participants
were found to be misguided by attractive and luring sentences of email or website.
![Page 17: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/17.jpg)
13
Moreover, subjected situation was a key factor; in the experiment there was no penalty
for wrong decision which affected participants’ decisions.
A classical example about the impact of belief in decision making during phishing
detection is mentioned in the experimental case study performed on a bank’s employee
by Aburrous et al. [2010]. They found that some Internet users strongly believe that they
are capable of detecting all kinds of phishing attacks and avoid using anti-phishing tools
which, unintentionally, expose them to the higher risk of phishing attacks. One of the
in-depth studies about Internet users’ behaviour while interacting with phishing was
done by Dong et al. [2008]. Their research focused on Internet users’ behaviour during
interaction with phishing websites and their decision making process. They also
designed a model called “user-phishing interaction model” after a cognitive
walkthrough on four hundreds phishing websites; identifying users’ activities,
information used, and assumptions/executions that Internet users make during their
interaction with phishing webpage. A diagrammatic representation of the information
Internet users may use when encountering phishing attacks is shown in Figure 6.
Figure 6: The overview of User-Phishing Interaction [Dong et al., 2008]
External information. This is the information that users perceive from user
interface (includes phishing emails/communication), as well as other sources
(such as expert advice).
Knowledge and context. This is the information that user perceive from his
environment, social networks, past experience, things happening around him
etc.
![Page 18: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/18.jpg)
14
Expectation and previous perception. After each action, Internet users have some
expectations. This is the information retrieved from this expectation and also
understanding of the system.
In their Decision Making Model, Dong et al. [2008] mentioned the following two kinds
of decision that users make when interacting with phishing activities and reflect in their
content.
Decide on a series of action to take. This is taken consciously. This affects the
decision whether to proceed or not.
Decide whether to proceed or not. This is, usually, taken subconsciously.
Both decision making processes are further divided into the following three steps:
Construction of the perception of the situation
Generation of possible actions to respond
Generation of assessment criteria and choosing an action.
A diagrammatic representation of their Decision Making Model is shown in Figure 7.
Figure 7: Decision Making Model [Dong et al., 2008]
2.3. Objectives of existing phishing prevention methods
There are several phishing prevention methods resulted from different studies
conducted on protection against phishing. These phishing prevention methods are
primarily motivated to look for:
![Page 19: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/19.jpg)
15
(i) Reasons behind Internet users’ tendency to fall for phishing
(ii) Design techniques to educate and aware about phishing
(iii) Design effective User Interface (UI) and warning to alert about phishing
(iv) Development of countermeasure to automatically detect phishing
(v) Evaluation of the effectiveness of existing phishing prevention methods, and
(vi) Invent proactive strategies for phishing prevention.
Below there are references and examples from all these research studies.
2.3.1. Reasons behind internet users’ tendency to fall for phishing
It is not uncommon for novice Internet users to be victimized by phishing; but
shockingly, it is found that even those with adequate knowledge about phishing are
tricked by phishers [Odaro and Sanders, 2011]. In a study conducted by Aburrous et al.
[2010] on bank’s employee found that even employee from Information Technology
(IT) department who are chiefly responsible to always remain alert about phishing got
tricked. Likewise, in a study by Dhamija et al. [2006], ninety percent of the participants
got tricked by good phishing websites. There are a number of such studies that have
examined the reasons behind Internet users’ tendency to fall for phishing.
Friedman et al. [2002] empirical study on users’ conceptions of web security
revealed that many Internet users are unable to differentiate between secure and insecure
website connection. The meaning of security varies from one Internet user to other and
many look to components in UI that can be easily copied from original website as cues
for secure connection. Likewise, the study by Dhamija et al. [2006] found that many
Internet users are unable to differentiate between legitimate and spoofed websites. Many
Internet users use the content of the website as cues for authenticity. There are a number
of Internet users who use padlock icon, animated graphics, pictures, and design touches,
such as logo, favicons etc. to differentiate between genuine and fake websites.
Most surprisingly, many Internet users do not hesitate revealing their personal
information to spoofed website despite warning from the phishing prevention tools
installed in their system. Dhamija et al. [2006] also blamed the ineffectiveness of
existing solutions designed for phishing prevention to be a reason behind Internet users
falling for phishing. These solutions are more technical and usually neglect some crucial
non-technical aspects in their design.
Similarly, Downs et al. [2006] study on Internet users’ mental model when
reading email and browsing web, and their vulnerability to manipulation revealed that
![Page 20: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/20.jpg)
16
merely having knowledge and experience about phishing is an ineffectual strategy for
phishing prevention especially, in the case of new phishing methods. One of the reasons
mentioned is that ineffectiveness could be because of current awareness techniques that
do not effectively mention about possible vulnerabilities or strategies to identify
phishing emails. Another reason could be due to the fact, one sometimes going too
rigid with certain knowledge can lead to suspect real email and web-based actions
[Odaro and Sanders, 2011] that are unlikely to work for many who conduct business via
web.
Wu et al.’s [2006b] study also found that many Internet users use website
appearances and content to differentiate between fraud and legitimate websites.
Moreover, security is rarely the primary goal of Internet users. They also indicate that
sloppy practices of web aid in confusing Internet users and impose them to risk. For
example, a web form is used to submit both sensitive and insensitive information, some
legitimate websites use Internet Protocol (IP) address URLs, some legitimate websites
have login page without Secure Socket Layer (SSL) or use SSL for very short time
which is unnoticeable for Internet users. Moreover, Ma [2006] and Wu et al. [2006b]
mention that lack of alternative is a factor behind Internet user falling for phishing.
Almost all phishing prevention approaches detect probable phishing, but they rarely
provide alternative to proceed and enforce Internet users to take risk. There is some role
of human behaviour as well to make Internet users fall in phishing trap.
2.3.2. Design techniques to educate and aware about phishing
Phishing is largely dependent on human factor, so educating Internet users and bringing
awareness about phishing is one of the potential countermeasures. All phishing attempts
are not complex to differentiate. The majority of phishing attacks contain visible
distinguishing factors which can facilitate Internet users in identifying them, however,
the majority of Internet users are found either not aware or not clear about them. Their
inability to distinguish legitimate websites from phishing websites is exploited by
phishing attacks. Surveys and studies undertaken by Friedman et al. [2002], Dhamija et
al. [2006], Karakasiliotis et al. [2007], Jagatic et al. [2007], Herzberg and Jbara[2004],
and Odaro and Sanders [2011] have revealed that Internet users lack proper knowledge
about phishing. Their skill to identify phishing attacks is not adequate enough and they
usually misclassify phishing websites as legitimate websites and vice versa.
Undoubtedly all phishing attacks cannot be detected manually. Yet, performing manual
![Page 21: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/21.jpg)
17
detection by Internet users can make a big change in reducing the number of people
falling for phishing. Wu et al. [2006b] found significant improvement in ability to
detect phishing attacks in Internet users before and after reading a tutorial by email
about phishing. Various kind of materials are available to educate and aware Internet
users about phishing and techniques to detect them manually.
Many online training materials are published by various government and non-
government organizations, business, security organizations, universities etc. Most of
the organizations that work on the prevention of phishing (e.g., APWG, antivirus
companies, universities working on phishing) or are targeted by phishing attacks (e.g.,
bank, e-business companies, finance companies) have included information about
phishing and instructions to be performed when encounter such scenario in their official
websites. An example of such information included in the website of Nordea Bank,
Finland is shown in Figure 8.
Figure 8: ‘About phishing’ page in Nordea Bank, Finland website
Many other online materials are also available. “Anti-Phish Phil”, an interactive game
and “PhishGuru”, an interactive training system are designed by Cylab Usable Privacy
and Security (CUPS) Laboratory at Carnegie Mellon University that is used to educate
Internet users about phishing websites. Sheng et al. [2007] experiment on the role of
game to educate Internet users about phishing showed that game is more effective than
other means, such as reading text or reading online tutorial material. A screenshot of
“Anti-Phish Phil” game is shown in Figure 9.
![Page 22: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/22.jpg)
18
Figure 9: A screenshot of the education game called “Anti-Phish Phil”
Similarly, “Phish or Not Phish”, an online quiz developed by VeriSign is available for
free. It displays two similar looking websites snapshots and asks users to distinguish the
snapshot from a phishing website. After each answer, it displays the reasons that make
one of the snapshots from a phishing website. A screenshot of “Phish or Not Phish” is
shown in Figure 10.
Figure 10: A screenshot of the online quiz called “Phish or Not Phish”
2.3.3. Design effective UI and warning to alert about phishing
Dhamija et al. [2006] have mentioned that phishing cannot be solely solved by a
traditional cryptographic-based security framework; rather it equally needs inclusion of
usability and user experience. Several studies have indicated that bad or ineffective user
interface is some of the prominent factors behind weak performance of anti-phishing
software. Wu et al. [2006a] pointed out location of warning indicators found at
peripheral area in many phishing prevention solutions as one of the example of a very
poor design. Further, they mention that such warning indicators send very weak signal
![Page 23: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/23.jpg)
19
in comparison to much larger centrally located displayed spoofed web pages. Zhang et
al. [2007b] study also revealed poor usability performance of existing phishing
prevention tools. Some of the examples of poor design in phishing prevention tools are:
Use of red and green colour indicator , which is a poor choice for red/green colour
blindness unless there is some other noticeable cues included along with it,
Use of pop-up dialog boxes to warn when popular browsers (e.g., Internet
explorer (IE), Google Chrome, Mozilla Firefox) have option to block such boxes
and beside that most of Internet users dismiss such boxes without reading. An
option to disable pop-up dialog in IE 9 is shown in Figure 11
Some examples of the anti-phishing toolbars, which use poor ways to notify
phishing attacks, are: EBay’s Account Guard and SpoofGuard. EBay’s Account Guard
shows green icon to indicate the webpage belongs to eBay or PayPal, Grey icon for
unidentified websites, and red icon to indicate potential phishing website [eBay
Toolbar’s Account Guard]. SpoofGuard displays traffic light colours (Red: above their
threshold value, Yellow: probably hostile, and Green: for low scores and is probably
safe) to indicate a website chance of being a phish [Chou et al, 2004].
Figure 11: Highlight of pop-up blocker in Internet Explorer 9
Currently, significant improvement can be seen in the usability of some phishing
prevention tools. Popular browsers are using active warning that is displayed on the full
page. Such warning cannot be unnoticed by Internet users. Internet Explorer uses both
active and passive warning; when it gets confirm that the website is a phishing website,
![Page 24: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/24.jpg)
20
it uses active warning whilst for suspected webpage passive warning is used. An active
warning displayed by Google Chrome browser is shown in Figure 12.
Figure 12: Active warning message displayed in Google Chrome
Similarly, many other researchers have designed user friendly interfaces. Dynamic
Security Skins [Dhamija and Tygar, 2005] used a random photographic image in the
background of password window as cues to differentiate between legitimate and fake
website. Each Internet user is assigned a unique image and is recommended to enter
password only after his personal image is loaded. SpoofStick displays the website’s real
domain and expose the websites that obscure their domain name [SpoofStick]. TrustBar
makes SSL more visible by displaying the logos of the website and its certificate
authority [Herzeberg and Gbara, 2004]. Netcraft toolbar enforces display of browser
navigational controls (toolbar and address bar) in all windows, to defend against pop-up
windows which attempt to hide the navigational controls. In addition, it clearly displays
sites’ hosting location, including country that helps in evaluating fraudulent URLs
[Netcraft].
2.3.4. Development of countermeasures to automatically detect phishing
Human ability to detect phishing is limited and varies among Internet users. Moreover,
manual method to identify phishing can be deluded. Therefore, there are several
software tools developed in order to identify phishing websites. These software tools
can be phishing emails filter, such as Phishing Identification by Leading on Features of
Email Received (PILFER) [Fette et al., 2006], which uses a machine learning based
approach to examine a set of ten features in suspected email. PILFER also uses Support
Vector Machine (SVM) classifier for reference implementation. Another approach for
spam filtering is greylisting which blocks spam at the mail server level based on the
behaviour of sending server, rather than the content of the message. The mail server
![Page 25: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/25.jpg)
21
that employs greylisting deliberately dismiss mails from unknown or suspect sources
with temporary error until configured period of time. It relies on the fact that many
spam sources, i.e., Simple Mail Transfer Protocol (SMTP) used by spammer, do not
maintain queues for retrying message transmission. When a sender has proven itself
able to properly retry delivery, the sender is added into the whitelist so that no more the
mail from the sender is impeded. However, the problem with phishing emails filter is
that it fails to stop phishing attacks that use other mediums, such as IRC, Messenger,
and advertisement [IBM Internet Security Systems, 2007]. Moreover, such phishing
email filters are unable to stop all malicious emails.
There are several software tools mostly in the form of browser toolbars in order to
detect phishing attacks that use other mediums including emails. Some of anti-phishing
tools are: phishing prevention tools integrated in popular anti-virus software, such as
Norton antivirus and AVG antivirus; inbuilt in popular browsers, such as Internet
Explorer, Mozilla Firefox, and Google Chrome; as a independent applications or web
browser add-ons, such as FraudEliminator [Fraud Eliminator], Netcraft toolbar
[Netcraft], eBay toolbar [eBay Toolbar’s Account Guard], EarthLink toolbar
[EarthLink], Geo Trust Trustwatcher toolbar [Geo Trust], SpoofGuard[Chou et al,
2004], CallingID Toolbar [CallingID], Cloudmark Anti-Fraud toolbar [Cloudmark],
Google Safe Browsing [Google Safe Browsing], SpoofStick [SpoofStick], and TrustBar
[Herzberg and Gbara, 2004]. These tools employ either heuristic methods or list based
methods or both of them for phishing detection. Heuristic methods check characteristics
of website and decide whether it is phishing or not whilst list based methods maintain a
list of either genuine website (whitelist) or phishing website (blacklist) and verify if the
website is in the list to decide phishing or not phishing. Each technique has its own
pros and cons. This thesis is also about heuristic methods for phishing detection. So, in
the later chapters, details of heuristic methods and list based methods used for
automatic detection of phishing are covered.
Then, there is DNSSEC Validator [DNSSEC Validator], an add-on made for
Mozilla Firefox browser that detects DNS spoofing. The DNS Validator compares only
the DNS records of the domain name used in page address and the IP addresses from
where the Firefox download the page in order to detect DNS spoofing. A screenshot of
DNS Validator is shown in Figure 13.
![Page 26: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/26.jpg)
22
Figure 13: A screenshot of browser add-on “DNSSEC Validator” [DNSSEC Validator]
2.3.5. Evaluate the effectiveness of existing phishing prevention methods
Despite wide media coverage of phishing and numerous phishing prevention
techniques, phishing remains effective. This brings forth a serious concern on the
efficiency of methods used for phishing prevention. Many studies are conducted in
order to examine the efficiency of the existing phishing prevention methods. These
studies expose the reliability of phishing prevention methods and at the same time point
out their deficiencies which can be helpful to improve existing phishing prevention
methods as well as forth coming methods.
Wu et al. [2006b] study on the effectiveness of security toolbars revealed that
existing security toolbars are big failure in mitigating phishing attacks. They pointed out
several factors, such as very small alert display in comparison to content display located
at the periphery that gets unnoticed, security not as primary goal of Internet users, and
distrust towards such toolbars due to their false positive , can be responsible for
ineffectiveness of phishing prevention methods.
In another similar study by Zhang et al. [2007b] to observe the tool performance,
testing methodology, and user interface of eleven selected phishing prevention tools
(i.e., CallingID toolbar, Cloudmark Anti-Fraud toolbar, EarthLink toolbar, eBay toolbar,
Firefox 2, GeoTrust TrustWatch toolbar, Microsoft Phishng Filter in Windows Internet
Explorer 7, Netcraft anti-phishing toolbar, Netscape browser, and SpoofGurad) revealed
that these tools are under performing and all of them are incapable to protect Internet
users from the phishing attacks using sophisticated techniques. Their performance vary
with the source of phishing URLs used by them. Further, many of the tools even failed
for very simple exploit as well. They suggest that no single phishing prevention
methods can ensure high performance; multiple methods supporting each other used
together in an anti-phishing tool can provide better results.
![Page 27: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/27.jpg)
23
Some studies only evaluated the effectiveness of in-built anti-phishing toolbars of
web browsers. For instance, Ludl et al. [2007] analyzed the effectiveness of blacklists
maintained by Microsoft and Google. The blacklist maintained by Microsoft is used in
Internet Explorer whilst the blacklist maintained by Google is used by Google Chrome
and Mozilla Firefox. Google Chrome, Mozilla Firefox, and Internet Explorer are the
most widely used web browsers, their inbuilt anti-phishing toolbars are also the most
widely used. The study focused on three crucial factors: coverage and quality of
blacklist, and list update time. It indicated that blacklist based phishing prevention is
satisfactorily effective and especially from Google; however, blacklist based phishing
prevention’s inability to detect new phishing attacks can be handled in large extent
using heuristic techniques in the way IE browser use heuristic technique to complement
list based technique.
Likewise, Bian et al. [2009] evaluated the effectiveness of three external online
resources (Google PageRank system, Yahoo! Inlink data, and Yahoo! Directory
service). Their finding suggested that such online resources can be used to increase
efficiency of detection when used in conjunction with existing countermeasures.
Similarly, Egelman et al. [2008] studied the effectiveness of Internet browsers
warning and found that most of the Internet users heed to active warning (79% in their
experiment) whilst passive warning was no different than not displaying any warning.
They further found that the active warning in Mozilla Firefox was more helpful than IE
active warning. Sheng et al. [2009] also performed an empirical analysis to observe the
effectiveness of phishing blacklists and found that phishing blacklists are poor choice to
fight against zero hour phishing websites. Li and Helenius [2007] performed heuristic
usability evaluation on five selected anti-phishing client-side applications (i.e., Google
toolbar, Netcraft toolbar, SpoofGuard, Phishing Filter in IE, and anti-phishing IEPlug).
They suggested the following three points for an effective usability design of anti-
phishing client-side applications:
(i) Toolbar’s status should be visible to Internet users and anti-phishing client-
side application’s should have intuitive interface.
(ii) Warning should help Internet users to take the correct decision. The warning
for suspicious webpage should not be as strong as the warning for detected
webpage.
![Page 28: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/28.jpg)
24
(iii) Anti-phishing client-side application should be aided with a suitable help
system.
2.3.6. The need to invent proactive strategies for phishing prevention
Most of the investigations on phishing are motivated towards finding a new reactive
technique. A reactive technique is often effective against the types of phishing which
exist when the technique was designed but it abruptly failed to detect a phishing attack
that employs a new technique. Current trends in research are chiefly targeted towards
defending attacks from phishers or taking down phishing websites, when scammers are
continuously making new attacks. In fact, no adequate effort seems to be applied in
order to reach the root of the problem. There is a need of more research that can:
Strengthen the weak points in legitimate systems and make them tedious to
misuse
Develop strategies to retaliate and circumvent the phishers, and
Track the phishers to bring them under law enforcement
Law enforcement could be difficult in those countries that do not have provision for
such case; however, a study by APWG [2012] showed that the countries that host most
of the phishing websites are developed countries, USA topping the list with an average
of about fifty percent of the phishing websites hosted from there. Other countries
hosting most of the phishing websites for the first quarter of 2012 are shown in Figure
14.
Figure 14: Top countries hosting phishing websites [APWG, 2012]
Most of the countries in the list have stern law for cyber crime, so tracking such wicked
people to punish them by the law can discourage many phishing aspirants or at least to
those who are non techie and still conduct phishing. Similarly, making phishing activity
![Page 29: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/29.jpg)
25
sophisticated to conduct can highly affect the naïve players in phishing. There are some
proactive strategies that are directed towards reducing phishing.
One of the ways is to use web crawlers alike to that is used by search engine to
search phishing websites, and pass this information to appropriate Internet Service
Provider (ISP) to bring down the websites. However, there are some limitations in this
technique. Many countries do not have legal provision to remove such websites.
Moreover, such detection can consume time which can be enough for scammers to
fulfill their illegal desires.
Another similar technique is to flood the phishers’ database with false information
also called poisoning, but it is not Denial of Service (DOS) attack. This can make it
difficult for phishers to differentiate between valid and false data and sometimes even
can make the database useless. This technique, too, has limitations. It requires tracking
the spoof websites without any false negative. The time taken to track the fraud website
can be sufficient for phishers to victimize many Internet users. Moreover, any false
negative result can cause serious consequences and lawsuit.
Another proactive technique can be to keep watch on corporation’s logo
download. Many phishers use an authentic logo in their websites to give a more real
look to their fake websites. However, this technique, too, has some limitations. Firstly,
the corporation’s logo is also used by respective corporation’s partners and some other
legitimate websites; phishers can easily download from them. Figure 15 shows a page
from a website that has logos of various banks and hundreds of such websites are
available online.
Figure 15: Logos of various banks used in a personal blog website
Secondly, making a copy of legitimate website logo is not difficult for many good
designers. After all, how many Internet users can correctly differentiate between a
legitimate logo and its copy is still a question to be answered.
![Page 30: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/30.jpg)
26
One of the prominent works related to proactive strategies to track phishers is
from McRae and Vaughn [2007] using web bugs and honeytokens. In their experiment,
they used uniquely named HyperText Markup Language (HTML) image tags of one
pixel by one pixel for each phishing e-mail as honeytokens or web bugs. The links of
HTML and image links were filled to all the values of the variables with a text data type
in phishing website forms and submitted. When phishers viewed the results from
HTML enabled environment that does not filter or block third party images from being
loaded, this image get retrieved from the server by the attacker. This is used to gather
information about individuals or groups who viewed the data collected by phishing
schemes. However, this technique, too, can be bypassed by using various approaches,
for instance,
View the results of phishing form in text-only viewers.
Disable the HTML code and prevent any referral from being made in the web
server log.
Disable loading of third party images in whatever browser used.
Use a web proxy (usually some hacked system) to view the results.
Another proactive approach is from Hacker Factor Solutions [2005] who
proposed to use page encoding in order to encapsulate each web page to stop phishing
websites generated using mirroring techniques. Availability of various mirroring
techniques (Web browser “Save as” option is the simplest mirroring technique; tools,
such as “wget”, WebWhacker, Templeton, telnet, netcat) have drastically reduced the
time and effort of phishers in making fake websites. In fact, such techniques are acting
as catalyst to vigorous growth of phishing and encouraging many novice cybercriminals
to perform phishing. However, the problem with this approach is that it uses Javascript
code to decode the page content, whilst all popular web browsers have options to
disable Javascript and in current time only few websites require Javascript enabled.
Figure 16 shows the options to disable Javascript in Google Chrome browser.
![Page 31: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/31.jpg)
27
Figure 16: Options to enable and disable Javascript in Google Chrome
Moreover, there are add-ons like NoScript for Mozilla Firefox browser that can be used
to allow execution of Javascript, Java, Flash, and other plugins only by the selected
websites. Figure 17 shows the options to disable execution of script in NoScript.
Figure 17: Options to enable and disable script execution in NoScript
There are many other issues with this approach, some of them are:
Search engine will be unable to index the page since it is encoded.
It cannot provide protection against phishing malware.
It requires routine (may be weekly) change in encoding algorithm.
It needs specialized skill and more time to develop such websites.
![Page 32: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/32.jpg)
28
Likewise, another technique is from Li et al. [2007] that suggest misuse-oriented
prevention, i.e., protect form phishing attacks with the misuse case method from a
system design perspective. Security requirements are often not stated during
requirement elicitation and analysis, leaving vulnerabilities in future Information
systems which later are compromised by scammers. Such vulnerabilities can be fixed
using a misuse case approach at requirement gathering (a designer is asked to abuse
each use case, and then its countermeasure is identified and employed. It continues in
iterative way unless it does not get full proof.). The summery of the methodology of
misuse cases are:
a. Design the use cases of the system
b. Personate a misuse, who intends to compromise the system;
c. Design the misuse for a specific use case;
d. Find a countermeasure for a misuse case;
e. Judge whether the countermeasure is vulnerable; if yes, go to step c, otherwise go
to the next step;
f. Find whether there is possible vulnerability or misuse; if yes, go to step c,
otherwise security requirement elicitation ends.
Even though the technique could be beneficial for cases in which websites are hacked
and compromised to conduct phishing, but its ability to prevent the majority of phishing
in which phishers develop an independent websites or ask information through email
cannot be seen. Moreover, no matter how full proof system you design, the hackers may
find some ways to intrude. This can also be seen from the news of attack on the
Pentagon (the headquarters of the United States Department of Defense) computer
system and 24, 000 files stolen [NYDailyNews.com, July 14 2011], and the news that a
hacker succeeded to hack the computer systems owned by Oracle, NASA (National
Aeronautics and Space Administration), the U.S. Army, and the U.S. Department of
Defense [IDG News Service, May 10 2012].
2.4. Classification of phishing prevention techniques
There are several promising techniques that significantly prevent phishing attacks.
These techniques have to deal with both technical and non-technical factors. Therefore,
in the first level, phishing prevention techniques can be classified into technical
methods and non-technical methods. The technical methods can be further categorized
![Page 33: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/33.jpg)
29
into list based methods and heuristics methods [Dunlop et al., 2010]. A classification
hierarchy of phishing prevention techniques is shown in Figure 18.
Figure 18: Classification of phishing prevention techniques
Technical methods. Technical methods deal with technical vulnerabilities in
Information systems; tools for phishing detection, prevention, and response;
designing game, online tutorial, quiz for Internet awareness etc. Some of the
examples are: Anti-virus integrated with phishing prevention; in-built system in
web browsers; software tools, such as FraudEliminator, Netcraft toolbar, eBay
toolbar, EarthLink toolbar, Geo Trust Trustwatcher toolbar, SpoofGuard,
CallingID toolbar, Cloudmark Anti-Fraud Toolbar Google Safe Browsing,
SpoofStick, TrustBar , Anti-Phishi, DOMAntiphish, PwdHash etc.
o List based methods .List based methods classify websites into either
phishing or trusted one and maintain into database lookup in the form of
either blacklist or whitelist. These lists can be of IP addresses or domain
name or URLs. Blacklist is a list of IP addresses or domain names or
URLs collection of phishing websites whilst whitelist is a list of IP
addresses or domain name or URLs collection of legitimate websites.
List based methods are discussed in detail in section 3.1.
o Heuristics methods .Heuristics methods check for one or more
characteristics of websites and decide whether it is phishing or legitimate
website. It utilizes the properties like HTML and script code of website,
URL, UI design, page content for phishing websites identification.
Heuristics methods are discussed in more details in section 3.4.
![Page 34: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/34.jpg)
30
Non-technical methods .Non-technical methods deal with the factors which are
related to studying Internet users’ behavior, social engineering principles and
techniques used by phishers, legality of using any techniques, training Internet
users about phishing, information and guidelines for safe browsing, and cyber
laws to punish phishing culprit.
Since, the purpose of this thesis is to concentrate on technical methods, i.e., list based
methods and heuristic methods specifically used in browser based applications for
phishing prevention, here non-technical methods are not further discussed.
2.5. Phishing prevention applications
Both list based methods and heuristic methods are implemented in server-side
applications and client-side applications (i.e., browser based applications, since client
side applications are widely used as web browser toolbars) used for phishing
prevention. According to the implementation architecture of client-side applications,
they are further categorized into two types: client-server structured applications and
independent applications [Li et al., 2007]. A classification hierarchy of phishing
prevention applications are shown in Figure 19.
Figure 19: Classification of phishing prevention applications
(i) Server side applications. Server side applications are employed in the servers
(e.g., organizational server, email server, ISP server) for phishing identification
and remedy. Bayesian Filters are installed in the server to detect phishing emails.
Although, such filters are an effective technique for phishing prevention, it
should be noted that such filters cannot be hundred percent accurate and above
all email is not a sole channel (other popular channels are message boards, web
![Page 35: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/35.jpg)
31
banner advertising, instant chats, such as Internet Relay Chat (IRC) and instant
messenger) of phishing attacks. Many other applications that use IP addresses
and URL blacklist, heuristics and fingerprinting (compares known samples of
phishing message against incoming emails) are deployed in ISP’s servers for
phishing prevention.
(ii) Client side applications or browser based applications. Web browser is the
most common method used by Internet users to get access of web contents.
There are other methods too, but they are usually tricky and complex, which
makes them unsuitable for general Internet users. Furthermore, it is the foremost
layer with which Internet user interacts, and tracking user’s activity at this level
is potentially more effective. Its strategic positions make it suitable to warn
Internet users directly and effectively [Sheng et al., 2009]. Even a study by
Egelman et al. [2008] found that phishing warning in Mozilla Firefox 2 was very
effective, and was able to stop all participants in their study from entering
sensitive information into fraudulent websites. In addition, web browser market
is dominated by selected number of browsers, i.e., Google Chrome, Internet
Explorer, Mozilla Firefox, Safari, and Opera. All together, it is easy to handle
phishing at the browser level.
This also does not mean use of web browser is free of limitations. Most of
browser based techniques act when webpage is loaded, which is risky from
malware and other malicious code prospect that are used for phishing [Garera et
al., 2007; Ma et al., 2009]. Other factor that has always been challenging for the
researcher and security expert in browser based techniques is the mode to
display the warning messages. Passive warning used to notify about phishing,
such as change in colour, pop-up with textual information displayed at the
corner or periphery of browser without interrupting browse activity is either
unnoticed or neglected by Internet user [Wu et al., 2006]. Current trend is to use
active warning, which enforces Internet users to notice and take action by
interrupting the browsing activity. However, it can be debatable how acceptable
such interruptive warnings are, more specifically in case of false negative. This
might be a reason that IE uses active warning when it is confirmed that the
website is a phishing website otherwise, it uses passive warning for doubtful
websites. Thus, such warning should be precise and accurate. Any wrong
![Page 36: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/36.jpg)
32
warning or alert can raise the question on its reliability which ultimately will
reduce Internet user trust towards it.
Despite some limitations in use of browsers based techniques for phishing
prevention, they are widely used. Nowadays, most of the phishing prevention
applications are found to be concentrated on the most vulnerable client side [Li
et al., 2007] and for them browser based applications highly suit. Such
applications are either inbuilt or they are independent browser toolbar that can
be embedded into the web browser. The current version of all popular browsers
(Google Chrome, IE, and Mozilla Firefox) comes with inbuilt phishing
prevention system and some other features (e.g., block pop-up windows, enable
and disable Javascript or Active script in IE, warn when sites try to install add-
ons in Mozilla Firefox) that contribute in fight against phishing attacks. Some
examples of independent browser toolbars are: Netcraft Anti-phishing toolbar,
eBay’s Account Guard, SpoofGuard, Microsoft Anti-phishingtoolbar for IE etc.
Client-server structured applications. Client-server structured applications
routinely request for update and maintenance from the server. Such kinds of
toolbars are usually made by commercial organizations, such as Google,
Microsoft, and Netcraft. Mozilla Firefox uses Google Safe Browsing and
updates its blacklist for the first time when the feature is enabled and after
that it updates in every thirty minutes. It communicates with the Google
server during two occasions: during the regular update of blacklist and when
the reported phishing website is encountered so that before blocking the
website it doubles checks to confirm the website is not removed since the
last update. Similarly, Google Chrome contacts the Google servers within
the five minutes of start-up, and approximately every half an hour thereafter
to download updated lists of suspected phishing websites. Likewise in IE
from version 8, it uses “SmartScreen filter” for phishing detection that does
both local verification and online lookup for phishing website identification.
SmartScreen filter uses both list based method and heuristic method for
phishing website identification. In the beginning, i.e., local verification, it
looks for the website’s URL in the whitelist (generated by Microsoft) stored
on users’ computer. In case, the website is not found in the list, it uses
heuristics method for probable deception detection. When the heuristic
![Page 37: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/37.jpg)
33
method indicates the website is suspicious, it sends the website addresses to
the Microsoft online service in order to compare with its blacklist, i.e.,
online lookup. Figure 20 shows the option to enable “SmartScreen Filter” in
IE 9.
Figure 20: SmartScreen Filter in IE9
Similarly, Netcraft Toolbar too communicates with the Netcraft web
server’s database to obtain the blacklist of phishing websites [Netcraft]. In
addition, the toolbar displays also other information related to the website
like date it was first surveyed, country where it is hosted, popularity
amongst toolbar users, and other information that can be seen from Figure
21.
Figure 21: Netcraft Toolbar [Netcraft]
Independent applications. Independent applications use the data stored in
local systems to identify a deceptive website. The working mechanism of
such toolbars is as follows: After the webpage is downloaded into the
![Page 38: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/38.jpg)
34
local computer, it compares the characteristics of websites with the data
stored locally. When any anomalies are detected, it warns the Internet
users. An example of such toolbars is SpoofGuard, a plug-in for IE that
accesses the IE history file along with three additional files stored in the
user profile directory for phishing detection. The three additional files
are comprised of: read only file of host names of email sites, such as
Hotmail, Yahoo!Mail, and Gmail, used in the referring page check; file
of hashed password history (domain name , username, and password)
and the file of hashed image history[SpoofGuard]
3. Analysis of strength and limitations of technical phishing
prevention methods
The two technical methods (i.e., list based methods and heuristic methods) for phishing
prevention are further decomposed into their constituents depending on strategies used
for phishing detection. Further details about them with their pros and cons, and several
studies related to them are discussed in the following sections:
3.1. List based methods
The list based methods are reactive techniques for phishing prevention. It maintains a
database lookup of either trusted websites (whitelist) or malicious websites (blacklist).
Such list can be maintained either locally or hosted at the central server.
3.1.1. Whitelist method
Whitelist is the list of trusted websites that an Internet user visits in regular basis. When
the whitelist is exclusive, it allows access to only those only those websites which are
considered trusted and thus is highly effective against zero hour phishing. It also does
not produce any false positive results unless there is any wrong entry in the whitelist.
However, it is very difficult to determine beforehand all the websites which users may
want to browse and accordingly update the list on time. Any failure in updating the
whitelist causes high false negative and severe usability penalty, which also might be a
reason behind the low popularity of whitelist. SmartScreen Filter [MSDN IEBlog] is a
feature in IE8 and IE9 browsers that uses whitelist for phishing prevention; however, it
further uses heuristic method and blacklist method in order to confirm the phishing
webpage. Anti-Phishing IEPlug [Li and Helenius, 2007] is another toolbar made for
![Page 39: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/39.jpg)
35
Internet Explorer that uses whitelist method. It uses a whitelist of domain names
maintained by the Internet user or computer administrator. It checks whether the
webpage that the Internet user wants to visit contains password input field or not. When
password input field is detected, it checks whether the domain contains any domain
names in the whitelist. It warns the Internet user when an address to be visited contains
a keyword that is saved in the whitelist , but the actual domain is different.
There are very few studies that have focused on improvement of whitelist. One of
such study is from Cao et al. [2008]. They designed an approach called “Automated
Individual White-List (AIWL)” that stores all familiar websites with Login User
Interface (LUI). AIWL uses the Naïve Bayesian classifier in order to identify websites
with login page. Each time an Internet user submits confidential information to any
website that is not in the list; the user gets an alert message. A new website is added to
the list when the user continues to submit the confidential information to the website
several times despite the warning. Although, this approach includes a mechanism for
the auto-update of whitelist that differentiate it from a general whitelist method, it
possesses several limitations, such as:
The initial list used by this method is not automated that means it will either
initially have zero entry or it has to rely on some other mechanism for the initial
list.
The update mechanism used by this method is highly dependable on Internet
users’ ability to distinguish legitimate websites, when studies have shown that
Internet users are not good at identifying phishing websites [Friedman et al.
,2002; Dhamija et al., 2006; Karakasiliotis et al., 2007; Jagatic et al. ,2007;
Herzberg and Jbara,2004; Odaro and Sanders ,2011].
The reliability of method that alerts the user even for legitimate website and many
times for the same legitimate website is in itself questionable.
In conclusion, whitelist method can be an effective technique when used to
complement other technique, such as blacklist method and heuristic method. It can be
used for the first level verification, so that those legitimate websites which Internet
users visit very often do not have to go through time-consuming verification process
and most importantly they do not get misclassified.
![Page 40: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/40.jpg)
36
3.1.2. Blacklist method
Blacklist is the list of IP addresses or Domain Names (DNs) or URLs of treacherous
websites, although, IP addresses and DNs used by the scammer can be blocked.
However, many times phishers use hacked DNs and servers [MarkMonitor Inc., 2008].
So, blocking the whole DNs or IP addresses can unintentionally block many legitimate
websites which share the same IP addresses and DNs. Therefore, blacklisting URLs is,
comparatively more appropriate for blacklist [Sheng et al., 2009]. It is a widely used
technique for phishing prevention. Even the popular web browsers (i.e., Google
Chrome, Mozilla Firefox, and IE) use blacklist for phishing detection. It detects
malicious websites that are included in the blacklist, so it has a very low false positive
and is favoured over heuristic methods. The low false positive rate and the simplicity in
design and implementation especially with browser can be the reasons behind the
popularity of blacklist method. The low false positive also reduces the liability risk of
incorrectly labelling a legitimate website as a phish.
Despite these all benefits and the wide popularity of blacklist, it possesses
following three main challenges.
(i) Zero hour phishing. It takes time to include a new phishing website in to the
blacklist. Thus, it is ineffective against zero hour phishing, leaving the Internet
users vulnerable to phishing unless it is not discovered. An empirical analysis by
Sheng et al. [2009] on the tools that use blacklist revealed that most of such
tools are able to catch only less than 20% of phish at zero hour. Moreover,
majority of phishing websites are short lived and the most of damages are done
during this short time span. Thus, delay in list update reduces the effectiveness
of the blacklist.
(ii) Update mechanisms. Everyday there are hundreds of new phishing websites
added to Internet. Most of the blacklists, for instance, PhishTank relies on
manual verification of websites due to its high accuracy; despite the fact that
manual verification is time inefficient process. There are some blacklists, such
as the Google blacklist that uses automatic verification employing heuristics via
machine learning techniques which is a quick process but introduces
comparatively more inaccuracy in the list. The compilation and maintenance of
blacklist in itself is a multiple step process, and the two steps are:
![Page 41: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/41.jpg)
37
Data (phishing URLs) gathering. It needs the gathering of data (phishing
URLs) from various sources, such as: spam traps, detected by filters,
users reported (APWG List, Phishtank list), compiled by other parties,
such as takedown vendors or financial institutions.
Verification of websites. After the data gathering, it needs verification of
the websites to identify phishing websites. This verification often relies
on human reviewers for reliability. Sometime verification from multiple
reviewers is needed for more accurate result. Phishtank’s statistics
showed that manual review process of URLs takes considerable amount
of time, ranging from a median of over ten hours in March, 2009 to a
median of over fifty hours in June, 2009 for single URL [Whittaker et
al., 2010]. Although, PhishTank was able to significantly improve this
figure; it dropped the median time to identify a phish to 12 hours in Jan,
2010 and to 2.4 hours in Jan, 2011, its verification mechanism still
leaves several suspected URLs unidentified [Liu et al., 2011]. The
verification mechanism prescribed by PhishTank requires 4 votes to
confirm a website as a phish, and those URLs that receive less than 4
votes also called “wasted votes” are declared unidentified URLs.
Moreover, it should be noted that the median time 2.4 hours is after the
suspected website is submitted, which means there is a chance of delay
before the submission of the website to Phishtank; when most of the
victims fall for phishing scam within eight hours from the start of attack.
[Kumaraguru et al., 2009] Above that, phishing websites grow endlessly
making it difficult to always keep the list up to date. Even human
verification is prone to human error. Moore and Clayton [2008] found
power-law issue in the participants of PhishTank (i.e., participants who
periodically participate are more prone to making error in labelling), and
at the same time taking out human effort entirely out of the loop is too
risky [Edwards et al., 2007].
(iii) Matching mechanisms. The third difficulty of blacklist method is the ways
of matching URLs that Internet user enters with those from the list. An exact
matching of URLs can be easily evaded by automatically generated URLs from
phishers [Prakash et al., 2010], for example, Rock-Phish gang uses phishing
![Page 42: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/42.jpg)
38
toolkits to generate a large number of slightly varied URLs for a single phishing
website [MarkMonitor Inc., 2008]. A way to tackle such problem is to include
an ability to detect any changes in the URLs, but it introduces more inaccuracy
in the blacklist.
Therefore, it is clear that the efficiency of a blacklist basically depends on the
following factors:
list accuracy,
list update mechanism, and
URLs matching mechanism
There are several researches that have worked on those factors in order to increase the
efficiency of blacklist technique. One of such study is by Liu et al. [2011] to improve
the list update mechanism maintaining its accuracy .They suggest improving the
wisdom of crowds to maintain extremely low false positive rates and also reducing the
time to verify attacks. They designed an approach called “Aquarium” which is a
crowdsourcing technique that clusters similar phishes together and asks the manually
trained participants to vote for the cluster rather than individual phish. The mechanism
uses websites’ URLs submitted to PhishTank yet to be verified. The URLs are passed
through the whitelist technique to filter some of the legitimate pages, and reduce the
effort by reviewers. After that, the remaining URLs are clustered using Density-Based
Spatial Clustering of Applications with Noise (DBSCAN) and Shingling algorithm,
commonly used in the search engines for duplicate page detection. Finally, the clustered
URLs are submitted as tasks to Amazon’s Mechanical Turk (MTurk) system as Human
Intelligence Tasks (HITs) for verification by the participants. The weighing model of
votes from participants is based on their history of votes. Alike Phishtank, this
approach too requires the minimum of four votes to classify a cluster of URLs as phish.
Although, this approach improves the efficiency of reviewer in quantity, their efficiency
in quality is still questionable. Moreover, limitations, such as waste votes, power-law
issues of participation, limitation from MTurk (e.g., there is chance that each reviewer
can have different browsing experiences or get distracted by MTrek’s physical
environment) [Kittur et al., 2008], and inability to correct when participants make
incorrect classification still persist.
Similarly, to make the classification mechanism swift and timely, Whittaker et al.
[2010] designed a scalable machine learning classifier that automatically classifies
![Page 43: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/43.jpg)
39
phishing pages and is used to maintain Google’s blacklist. This classifier examines the
features which the human reviewers look for in suspected websites to identify phishes
e.g., page’s URL, page HTML content collected by crawler, and hosting information. It
also uses a logistic regression classifier to make the final decision. The classifier
classifies the websites submitted by Internet users and also those collected from the
Gmail’s spam filters. Moreover, the blacklist maintained by Google is found to be more
effective than its contemporaries in phishing prevention [Ludl et al., 2007]. The
problem in Whittaker et al. [2010] approach is that its efficiency is dependent on the
efficiency of Gmail’s spam filter, when there are various other ways (e.g., Internet relay
Chat, i.e., IRC chat, web banner advertising, and instant messenger, other email services
like Hotmail, Yahoo!Mail, RediffMail, and so on) that scammer use to reach their
potential victims [IBM Internet Security Systems, 2007] and on the activeness of
Internet users to report suspected website, when several studies have proved that the
Internet users are not good at identifying phishing websites [Friedman et al., 2002 ;
Dhamija et al., 2006; Downs et al., 2006; Wu et al., 2006b].
Likewise, an approach called “PhishNet” by Prakash et al. [2010] attempts to
tackle the URL matching mechanism problem. PhishNet uses two components and they
are:
(i) A URL prediction component. It works offline and systematically generates new
URLs that are the modified form of the URLs in existing blacklist employing
various heuristics, such as: changing the Top Level Domains (TLD), IP address
equivalence, i.e., grouping together URLs having the same IP addresses,
directory structure similarity, i.e., grouping together URLs with similar directory
structure, using query string substitution, and brand name equivalence, i.e. ,
replacing one brand name with another.
(ii) An approximate URL matching component. It performs an approximate
matching of the URLs entered by Internet users with the URLs in blacklist.
In fact it utilizes the finding that malicious URLs even after mutation remain usually
syntactically close to each other or semantically same, i.e. ,uses the same IP address.
The verification of generated URLs to find whether they are indeed malicious or not is
done with the help of Domain Name System (DNS) queries and content matching
techniques in an automated fashion thus ensuring minimal human effort. The matching
is performed using a novel data structure that performs approximate matches with
![Page 44: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/44.jpg)
40
incoming URLs based on regular expression and hash maps to catch syntactic and
semantic variations. Even though, this is a novel technique in generating various
modified form of URLs, however, it seems to utilize very few heuristic features to
check whether a newly generated URL belongs to phish or not. This means a phishing
website may get misclassified, especially when it looks for ninety percent similarity to
parent URL webpage in order to declare as phish page.
In conclusion, an effective blacklist must be comprehensive, error free, and
timely. An incomprehensive blacklist fails to protect a portion of its users. Similarly,
blacklist with wrong entry results unwanted warning which gradually trained Internet
users to disobey the warning [Whittaker et al., 2010]. Moreover, untimely update can
significantly degrade the quality of list. Therefore, an effective blacklist can be achieved
only, when it can use an error free automatic classifier with broad sources to receive
suspected websites for verifications and possesses URLs matching mechanism that can
detect all derivative URLs of phishing URLs. The study by Sheng et al. [2009] found
that tools that use heuristic method to complement blacklist performs better than those
using only blacklist, especially against zero hour phishing. Table 1 shows the summery
of list based methods with their main characteristics, pros, and cons.
3.2. Heuristic methods
Heuristic methods examine one or more characteristics of websites in order to detect
phishing websites. These characteristics are anomalies in the components of phishing
websites. In fact, even the automatic verification of phishing websites used to maintain
blacklists employs heuristic methods. Some of the heuristic methods are next analyzed.
3.2.1. The use of visual similarity measures for phishing detection
Phishing websites often imitate the look and feel of official websites with the same
layouts, styles, key regions, rendering, blocks, and most of the contents. They use
various non-text elements, such as images and flash objects to display contents. Such
mimic of an authentic website with only minimal required changes are often difficult for
Internet users to distinguish. Moreover, the use of non-text elements to display web
contents makes it even harder for general content based anti-phishing techniques. There
are some techniques, for instance, the technique proposed by Pang and Ding [2006] that
uses Optical Character Recognition (OCR) to analyze the contents in image, but it still
fails to analyze websites’ elements, such as flash objects and advertisement banners.
![Page 45: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/45.jpg)
41
However, such cases can be efficiently handled by the use of phishing prevention
techniques that employ visual similarity measures to differentiate between bogus and
original websites. All visual similarity measures use database to store genuine websites’
data. When any suspicious websites are met, their data are compared to the data of
genuine websites stored in the database to detect differences. The genuine websites’
data are stored in one of the following forms:
(i) DOM elements of genuine websites. In this case, DOM elements of genuine
websites are compared with that of suspicious websites
(ii) Captured images of genuine websites. In this case, features in the images of
genuine websites are compared with that of suspicious websites using the
various techniques of Image Recognition (IR).
There are several studies that used DOM elements comparison for the visual
similarity measure. One of the approaches is from Wenyin et al. [2005], which consists
of four modules:
(i) Suspicious URL detection module. It is the source for suspicious URLs which
are obtained from transformation of the true URLs and various suspicious URLs
detected in emails.
(ii) Suspicious webpage processing module. It validates whether any real webpage
exists for the URL supplied by the “Suspicious URL detection module” and
generates a representation of the found webpage, i.e., blocks and features of
suspicious webpage.
(iii) True webpage processing module. It obtains a representation of the true
webpage, i.e., blocks, features, and weight of the true webpage.
(iv) Visual similarity assessment module. It compares the true webpage and each
suspicious webpage and finally calculates their visual similarity based on their
intermediate representations.
The approach by Wenyin et al. [2005] uses three similarity metrics, i.e., block level
similarity, layout similarity, and overall similarity defined on webpage segmentation to
calculate visual similarity between two websites. In the block level similarity, the
similarity of features that represent text blocks and image blocks are measured.
Similarly, in layout similarity, the ratio of the weighted number of matched blocks in
the suspected website to the total number of blocks in the true webpage is calculated.
The overall style similarity focuses on the visual style of webpage, which can be
![Page 46: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/46.jpg)
42
represented by several format definitions, e.g., font family, background colour, text
alignment, and line spacing. The final verdict is made on the basis of similarity weight
of the suspected webpage which needs to exceed the similarity threshold in order to be
declared a phishing website.
Another similar approach is from Liu et al. [2006] called “SiteWatcher” that uses
visual similarity comparison and comprise of two sequential processes , the first process
runs at email server and the second process performs the visual similarity comparison.
It needs the registration of true URLs and their associated keywords to the system. The
process in the email server monitors and analyzes both incoming and outgoing emails to
find messages that contain keywords associated with the genuine website. All
embedded URLs from the messages that contain keywords are sent for visual similarity
assessment. After that, the second process performs visual similarity assessment at
block level, layout, and style. The visual similarity assessment includes the extraction of
visual features and the finding matches of suspected website against original website.
This matching is performed at blocks level, each visually and semantically, then on
position constraints among blocks. It calculates layout similarity (i.e., the weighted
number of blocks by the total blocks in the true page), calculates overall similarity on
the basis of distribution of features values, and the correlation coefficient of two pages’
histogram as the overall style similarity. It issues phishing reports to the respective
genuine website’s owner when the visual similarity reaches higher than corresponding
threshold values.
The problem with both of the above mentioned approaches is that they use feed of
legitimate website and cannot detect phishing websites that target other than the
websites in the database. The approach by Liu et al. [2006] even needs unique keywords
that can represent the legitimate website, which is an additional burden on Internet
users. Moreover, such approaches that completely rely on code can be easily deceived
by the use of following tricks: rewrite HTML codes that give the same design but use
different DOM objects than the legitimate page as shown in Figure 22, use images that
provide the same look as spoofed website, and use code obfuscation techniques to alter
the codes. In addition, such approaches can result false negative when the same theme is
used to generate different websites.
![Page 47: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/47.jpg)
43
Figure 22: HTML codes and screenshots of the sign-in page in eBay.com [Lam et al.,
2009]
An approach that provides solution for the case mentioned in Figure 22, i.e., the
same design but uses different DOM objects than the legitimate website, is proposed by
Lam et al. [2009]. It uses visual similarity-based phishing detection effective even for
polymorphic phishing web pages. Polymorphic web pages are visually identical to
authentic web pages but uses different source code components than the authentic
webpage. This approach by Lam et al. [2009] performs page layout analysis and layout
block matching to calculate the degree of similarity using image processing techniques.
The authentic webpage is stored in a database. When a suspected webpage is found,
both authentic and suspect web pages are treated as images and Otsu’s thersholding
method is applied to transform images into black and white images. The degree of
similarity is ranked using classifier trained to handle such case. However, this approach,
too, cannot detect phishing websites that use code obfuscation techniques to alter the
source code. Moreover, using two processes for phishing website validation cannot
come for free; it is usually accompanied by degrade in time performance. In addition, it
still cannot detect websites that are not in the database.
The problems in visual similarity measure techniques that occurred due to
dependency on source code can be overcome by analyzing the features in captured
images of legitimate and suspicious websites. An approach by Fu et al. [2006] that
extracts the URLs from emails containing keywords associated with the protected
websites. This approach uses Earth Mover’s Distance (EMD) to calculate the visual
similarity of web pages. It first extracts the URLs from emails and then converts the
web pages associated to those URLs into normalized images. Next, it obtains the
![Page 48: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/48.jpg)
44
images’ signatures which comprise colour and coordinates features. Finally, visual
similarity is computed using the linear programming algorithm of EMD. The final
classification is made on the basis of similarity value of the suspected webpage. When
similarity value of suspected webpage exceeds the threshold value of protected
webpage, it is classified as a phishing website. However, the problem in this approach is
that it uses colour histogram which is unsuitable for web pages, since websites usually
contain very few colours [Liu et al., 2006]. Moreover, making even a minor change in
dynamic components, which are often unnoticed by Internet users, can significantly vary
colour histogram. In addition, use of colour histogram has high chance of false negative
results for websites that are designed using popular theme.
Another approach that uses images, but analyze many other features of images is
from Cordero and Blain [2006]. Their approach uses differences in image rendering of
web pages for phishing websites identification. It captures the Tagged Interchange File
Format (TIFF) image of entire rendered web page which is turned into more
manageable feature vectors by calculating a joint histogram with two features resulting
in 256 features per image. It uses Cocoa/Safari Engine for website rendering and GNU
Octave and Image Magick for data pre-processing. Although, this approach compares
far more features than the approach by Fu et al. [2006], it also posses various
limitations. It uses the image rendering and layout of webpage for phishing website
detection despite the fact that both of them are affected with change in window size.
Even changes in font type and font size make changes in the appearance of webpage. In
addition, a website uses several dynamic components, such as advertisement banners,
flash objects that are cumbersome to compare using this approach, since with each
scene the image changes.
Likewise, an approach that uses image which also claims to handle use of
dynamic objects in webpage is by Chen et al. [2009]. It considers phishing page
detection as an image matching process. It takes the suspicious webpage snapshot and
uses Contrast Context Histogram (CCH) to extract discriminative keypoints from
suspected webpage which are matched with that of the authentic webpage often targeted
by phishers. Such authentic web pages data are stored in the database from reliable
source. Computer vision and image processing are used to compare the similarity. The
degree of similarity is calculated using k-means algorithm and when it exceeds certain
threshold, suspected webpage is considered to be phishing website. Even though, this
![Page 49: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/49.jpg)
45
approach is effective against dynamic objects, such as advertisement banner, flash
objects, and video; however, it does not mention about degrade in time performance
that can occur due to processing of dynamic objects.
Different from all above mentioned approaches, Wang et al. [2011] proposed an
approach called “Verilogo”, which does not analyze the image of the whole webpage;
rather it analyzes only the logo used in the webpage. The main assumption of Verilogo
is that, logo is an easy means of recognition and it is deeply associated with given
organizations so it is often included in phishing websites to exhibit false originality. It
stores heavily phished logos and their related information in the database. It matches the
logo used by suspected webpage from the logos stored in the database using computer-
vision algorithm, then validate whether the suspected webpage has authorized hosting
IP address to use that logo or not. It warns the Internet users when they enter keyboard
input into the webpage that is not authorized to use the logo. Even though, comparing
logo is lighter than comparing the whole webpage, it protects only the websites whose
logos information is stored in the database. Moreover, it needs the list of all
organizations that are allowed to use a particular logo, which is another unconventional
situation.
In all of the above mentioned techniques that use visual similarity measures for
phishing detection, the common limitation is that all of them needed to know the
legitimate websites beforehand which is impractical. In order to remove this limitation,
Medvet et al. [2008] proposed an approach that uses three features to determine
webpage similarity:
Text pieces which also includes style-related features
Image embedded in the webpage, and
The overall visual appearance of the webpage as seen by the Internet user (after
the web browser has rendered the webpage).
This approach does not need initial list of legitimate web pages; instead it remembers
the pair of information (e.g., username, password) and the webpage in which Internet
user enters them. When Internet user enters the same credentials into any new webpage,
it performs the similarity comparison. The procedure is to retrieve the suspicious
webpage, transform the webpage into a signature, and compare the signature with the
stored signature of the legitimate webpage. In case of similarity, it raises an alert.
However, this approach neglects the fact that there are several Internet users who use
![Page 50: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/50.jpg)
46
the same credentials for different websites. Moreover, some banks and organizations
(e.g., Nordea Bank) use one-time password and such case cannot be protected by this
approach.
To sum up, visual similarity measure is suitable for server (e.g., ISP server) based
phishing prevention techniques so that server admin can maintain the list of phishing
prone websites. However, it still can be a question whether that is possible.
3.2.2. Use of search engine in phishing detection
There are several search engines (e.g., Google, Bing, Yahoo!, Baidu) that maintain
crawl database and perform page ranking to display search results. PageRank algorithm
that was formulated by Google founder Larry Page and Sergey Brown uses factors, such
as number of inbound links, number of outbound links, and other damping factors.
Moreover, there is a set of recommended guidelines from Google web master to prevent
removable of websites from Google search engine index [Google Webmaster
Guidelines]. These all suggest that web pages must follow Google web master
guidelines and it must have maximum inbound links in order to gain high page rank. In
the contrary, phishing web pages usually have very short life span and they are even
found to disobey the recommended guidelines [Garera et al., 2007]. Therefore, phishing
websites are either absent in the search results or possess a very low page rank. In
addition, the count of search results for phishing websites are usually very few that
mostly consist of other phishing websites and websites that maintain malicious websites
list, such as PhishTank. These features of search engine are applied by many researchers
for phishing detection. The two vital components of this approach are: extraction of
search keywords and selection of search engine. Some of the proposed approaches that
use search engines for phishing detection are mentioned next.
An approach that uses search engine for phishing detection is by Ma [2006]. His
approach uses the Google search engine results for phishing detection. His work is a
plug-in for Mozilla Firefox web browser that extracts unique keywords from the
website to be analyzed and uses the keywords as query word for Google search engine.
Then the URL of suspected site is compared with the URLs of the top search results. In
case of mismatch, it interrupts the Internet user and it suggests one of the top ranked
search results. However, the problem with this approach is that it does not mention
about the keywords extraction method and the number of search results to be compared.
![Page 51: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/51.jpg)
47
Another similar approach that is clear on both of the problems mentioned in Ma
[2006] approach is by Zhang et al. [2007b]. They proposed an approach called “A
Content-Based Approach to Detecting Phishing Website” or simply CANTINA that
examines the content of a webpage to identify phishing. It implements Term
Frequency–Inverse Document Frequency (TF-IDF) algorithm used in Information
Retrieval (IR) and Robust Hyperlink algorithm. TF-IDF algorithm is used to determine
the importance of a word in a document and Robust Hyperlink algorithm is used to
determine broken hyperlinks. The two ideas behind this approach are:
Phishers usually copy legitimate websites to generate phishing web pages. In that
case, Robust Hyperlink algorithm can be used to find the original log-in page.
Phishing websites often contain the original brand name which is common in
legitimate webpage, but it is relatively rare in web. Again in this case, Robust
Hyperlink algorithm can be applied to determine the actual owner of the
webpage.
The general working mechanism is as follows: first it calculates the score of each
term on the webpage using TF-IDF and then generates lexical signatures of the top five
terms which in concatenation with the domain name (even when the signatures already
contain domain name) is fed to the search engine (in this case Google). Finally, it
classifies the suspected webpage as a phishing webpage if its domain name does not lie
in the top thirty results of search engine. Even in the case when the search result count
is zero, the suspected webpage is classified as a phishing webpage. The limitations of
this approach are:
It works only with the webpage that has content in English language
It takes time because it involves querying Google
It can be bypassed using techniques, such as: use image content instead of textual
content, use unrelated text in invisible form (i.e., use font colour that is used as
webpage background colour), change enough words in the webpage, and use
webpage already high ranked in search engine result.
This approach uses linear classifier, which has its own limitations [Xiang et al.,
2011].
Likewise, Xiang and Hong [2009] proposed an approach that uses search engine
technique in association with other techniques for phishing detection. Their approach
uses IR methods to recognize the identity of the claimed webpage and captured phishing
![Page 52: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/52.jpg)
48
webpage by examining the discrepancies between the claimed identity and its original
identity. It uses Named Entity Reorganization (NER) algorithms to reduce false
positives. The identity oriented component is aided by a keywords-retrieval component
that employs search engines to detect potential phishing webpage via searching
keywords of significant importance with respect to IR. It includes whitelist methods
and login-form detector to filter good web pages and control false positive results. Even
though this approach has better handling for false positive, it still contains the
limitations mentioned for CANTINA.
Similarly, an approach is proposed by Huh and Kim [2011] that is lighter than all
the above mentioned approaches that use search engine for phishing detection. It uses
the full URL string without parameter of suspected webpage as the query for search
engine exempting it from the tedious process of keyword extraction. The total number
of search results and the ranking of suspected webpage are used to determine whether it
is legitimate or fake. It uses the fact that legitimate web pages get a large number of
search results and usually ranked the first in search results whilst phishing web pages
get only a few numbers of results and they usually have a low rank or no rank. The
validation of this approach was performed using three different reputable search
engines: Google, Yahoo!, and Bing. However, the problem with this approach is that it
fails to detect the phishing web pages which use compromised popular websites.
To sum up, using search engine is an effective approach for phishing detection.
The results are more accurate due to the high efficiency of webmaster of search engines.
Moreover, approach by Ma [2006] provides an alternative option to the Internet users to
proceed browsing. One of the reasons that enforce Internet users to risk clicking a
suspected website despite the warning from security system could be the lack of
alternative. Most of the phishing prevention systems just warn Internet users and rarely
provide any substitute. Then, the approach by Huh and Kim [2011] which uses the
whole URL for search improves the quality of search keyword. Apart from them, this
approach is independent of other resources, such as database, and is equally effective
for zero hour phishing.
However, use of search engines for phishing detection, too, has several limitations
and some of them are mentioned next.
It is the webmaster of search engine who determines whether a website should be
indexed or not. This decision is taken on the basis of fact, how much the website
![Page 53: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/53.jpg)
49
adhere to the recommended guidelines from webmaster for design content,
technical, and the quality of website. These guidelines help to make a website
search engine friendly [Google Webmaster Tools, Bing Webmaster Tools].
Search engine spider crawls the website on the basis of several factors, for
instance, Google looks to the factors, such as Pagerank, links to a page, and
crawling constraint like the number of parameters in a URL [Google Webmaster
Tools]. Moreover, Google PageRank is updated approximately in every three
month [Huh and Kim, 2011], and the case is similar with other search engines.
However, the concern is how many new legitimate websites do follow the
Webmaster guidelines. There are many legitimate websites designed by novice
designers who are unacquainted to the Webmaster guidelines. The situation
might improve when Content Management System (CMS) tools, such as
Joomla!, Druple, and Wordpress, is used for the design activities of webpage.
However, there are still many fresh legitimate websites which rank very low in
search result or they are not even in the rank of search engine results. Such
websites are misclassified by the phishing prevention approaches that use search
engines. Some of the legitimate websites, whose rank might improve, yet suffer
misclassification for three months in case of Google.
Such phishing prevention approaches can be easily bypassed by abusing a
legitimate website that already has a top ranking in search engine results or
registering a legitimate website to conduct phishing, even though such processes
are comparatively expensive.
Phishers can manipulate the ranking algorithms to get good ranking for their
websites in search engine results. Fourthly, search results vary with the kind of
search engine used. Figure 23 shows snapshots of search results after a
legitimate URL is entered as a query to two popular search engines, i.e., Google
and Bing.
![Page 54: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/54.jpg)
50
Figure 23: Same URL searched using Google.com and Bing.com
Thus, it is suitable to use popularity of websites to support other heuristic properties for
phishing detection.
3.2.3. Use of anomalies in phishing websites for phishing detection
Phishing websites mimic the look and feel of genuine websites at interface level, but
they are different at code level. In fact, they also contain many anomalies in their web
objects, HyperText Transfer Protocol (HTTP) transactions, and claimed identities [Pan
and Ding, 2006]. These anomalies can exist in their URLs, DOM objects, or webpage
contents. There are several studies that have utilized the varied sets of these anomalies
for phishing detection. Some of the prominent studies are mentioned next.
An approach by Chou et al. [2004] is a browser plug-in called “SpoofGuard” which
is designed for the client side defence against phishing. SpoofGuard examines
properties, such as domain name, URL, link, and image to identify probable spoof
attacks. Further, it also looks to the browse history in order to verify whether the given
domain was visited before or not. It also checks whether the webpage is opened by
clicking any link from email messages. Most importantly, it stores the hash values of
post data, i.e., username and password, and the domain name where the credentials are
used. When Internet users enter any credentials, it compares the post data with the
stored credentials and their respective domain names. It warns the user, when
credentials match but their domain names differ. The two major problems in this
approach are:
![Page 55: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/55.jpg)
51
It neglects the facts that many Internet users use the same credentials for different
domain names which can produce false negative results, and
It does not protect the websites which use one-time password, i.e., password is
valid for only one login session. It will store several credential for a single
domain name; precisely an entry for every login.
Likewise, Pan and Ding [2006] proposed an approach which detects phishing from
anomalies in the DOM objects of phishing websites. It employs two major components:
(i) Identity Extractor uses IR algorithm and χ2 test to extract web identity, and
(ii) SVM as Page Classifier takes input of web identity and a set of structural
features (i.e., web objects or properties relevant to web identity) to determine
whether a webpage is phishing page or not.
They also suggest using Optical Character Recognition (OCR) to extract contents from
phishing websites that use images in the place of textual contents. The main limitation
of this approach is that it uses an assumption “the distribution of identity-related words
usually deviates from that of other words” which is not completely true and this can be
observed from the high false positive results produced by the approach [Xiang et al.,
2011].
Similarly, an approach by Alkhozae and Batarfi [2011] looks to the violation of
W3C recommendations in webpage source codes to identify phishing websites. The
general mechanism is to assign an appropriate weight for each characteristic (W3C
violation) and an initial weight to the suspected website. An occurrence of each
characteristic in the suspected website reduces the corresponding characteristic’s weight
from the initial weight. The final decision is taken on the basis of remaining initial
weight after the examination. The smaller the weight, the higher is the probability of
being a phishing website. The main problem with this approach is that it depends on the
violation of W3C recommendations when it is unclear how many web developers really
know and follow W3C recommendations. Then, there are other web standards followed
by the development web industry, such as Internet Standards (STD) documents [IETF].
Moreover, there is a chance of bypassing this approach by the use of phishing website
that follows the most of W3C recommendations.
Problem with the above discussed approaches by Chou et al. [2004], Pan and
Ding [2006], and Alkhozae and Batarfi [2011] is that they load websites in order to
identify whether phishing websites, which ultimately expose Internet users to phishing
![Page 56: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/56.jpg)
52
conducted using malicious codes. Therefore, to overcome this danger, Garera et al.
[2007] proposed a phishing prevention approach that uses only anomalies in the URLs
of phishing websites to detect them. This approach uses various distinguishing features
of phishing URLs and a logistic regression classifier (trained with data from Google)
which also includes obfuscation style heuristics and general heuristics based on the
Google’s Index Infrastructure. The main problem of depending solely on URLs for
phishing detection is that such approach can be easily deceived using either registered
domains or some compromised legitimate websites to conduct phishing.
Another similar approach that uses only URLs analysis is by Ma et al. [2009].
Their approach uses statistical method from machine learning to identify phishing
websites. It examines the lexical features (i.e., textual feature of URLs) and host based
features (i.e., IP addresses properties, WHOIS properties, domain name properties, and
geographical properties) of URLs in order to know the reputation of websites. The
problems with this approach are:
It can misclassify legitimate websites that use URLs containing benign tokens
stated in the approach.
It can misclassify legitimate websites that use free hosting services.
It cannot detect phishing websites that use compromised legitimate websites.
It can misclassify legitimate websites that use redirection of services.
It can misclassify legitimate websites hosted in reputable geographical regions,
such as USA, despite the fact the more than fifty percent of phishing websites
are hosted in USA [APWG, 2012].
It can misclassify websites that possess international TLDs but are hosted in USA.
Even though URLs analysis protects Internet users from malicious software, it
lacks the accuracy that could have gained when using DOM objects and webpage
contents analysis. A more robust approach called “CANTINA+” is designed by Xiang et
al. [2011] that uses the resources including URLs, HTML DOMs, third party services,
and search engine to detect phishing websites. It uses five features from CANTINA
(discussed in section 3.2.2), and additional ten new discriminative features for phishing
websites identification. It employs two filters, they are
(i) Hash Based filter. It uses SHA1 hash algorithm .It is used for duplicate page
detection.
![Page 57: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/57.jpg)
53
(ii) Login form detection. It looks for three main characteristics of Login form, i.e.,
FORM tags, INPUT tags, and Login keywords (search for 42 different login
keywords).
Finally, it employs the machine learning detection model based on discriminative
features extensively trained as classifier. Even though CANTINA+ is more robust than
CANTINA, it still has some limitations which are:
It us unable to detect Cross-site scripting attacks,
It cannot detect phishing that is conducted using compromised legitimate
websites.
It cannot detect phishing websites that use images instead of textual content.
Above mentioned approaches detect phishing, but they do not report what kind of
attack is it. Choi et al. [2011] proposed a machine learning approach to detect malicious
URLs of all kinds including phishing, spamming, and malware infection. Along with
detection, it also identifies the attack type. It uses various discriminative features (e.g.,
lexical, link popularity, webpage content, DNS fluxiness, and network) for detection.
The methodologies used are SVM for detection of malicious URLs and RAkEL and
ML-kNN for identifying attack types of malicious URLs. The main problem with
machine learning approach is that its effectiveness is dependent on the type of data used
for training. Moreover, phishing schemes are dynamic and such classifier has to be
updated timely.
To sum up, anomalies in the URLs and source codes of phishing websites can be a
promising way to differentiate between phishing and legitimate websites. An approach
designed by Gastellier-Prevost et al. [2011] called “Phishark” in order to study the
effectiveness of URLs and page contents analysis for phishing detection, too, showed
that anomalies can be an effective means to distinguish between legitimate and phishing
websites. The major challenge in using anomalies for phishing prevention is the
legitimate websites that are developed by novice web developers or precisely, the web
developers who are unacknowledged about Internet security and various web
development standards. Such web developers unintentionally practice several anomalies
in their work and their websites usually get misclassified.
Table 1 is the summery of technical phishing prevention methods with their main
characteristics, pros, and cons.
![Page 58: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/58.jpg)
54
Methods Characteristics Pros Cons
Whitelist
method
It uses a list of trusted
websites and checks
whether a given website is
present in the list or not.
(i)It is effective against
zero hour phishing.
(ii)It produces almost
no false positive
results.
iii) It is simple in
design.
(i)It has difficult
update mechanism.
Blacklist
method
It uses a list of treacherous
websites and checks
whether a given website is
present in the list of not.
(i)It has low false
positive results.
(ii)It is simple in
design.
(i)It is ineffective
against zero hour
phishing.
(ii)It has difficult
update mechanism.
(iii)It has difficult
URLs’ matching
mechanism.
Visual
similarity
measures
It stores the information of
the DOM elements or
captured images of the
legitimate websites and
compares the information
from its database with that
of the suspicious websites.
It is effective against
phishing attacks
targeting websites
whose information is
stored in its database.
(i)It needs to store
data about the
legitimate websites
which has to be
protected from
phishing.
(ii) It cannot detect
phishing attacks
which target the
websites not in its
database.
![Page 59: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/59.jpg)
55
Use of search
engine
It extracts search keywords
from the given websites
and searches the keywords
using a search engine.
Then, it compares whether
the given URL is in the top
search results.
(i)It is simple in design.
(ii)It is very much
suitable for anti-
phishing tools that can
suggest alternative links
to Internet user.
(i)It can misclassify
many legitimate
websites.
(ii)Its accuracy
depends on selected
search engine.
Use of
anomalies in
phishing
websites
It looks for the
characteristics in DOM
objects or URLs of the
websites.
(i) It is not dependent
on any specific
phishing strategy and is
equally valid for all
kinds of phishing
websites.
(ii) It does not depend
on any external factors,
such as databases
(iii) It does not require
any changes in user
browsing habits.
(i)It is complex in
design.
(ii)Its accuracy varies
with the list of used
phishing-
characteristics.
Table 1: Summery of technical phishing prevention methods
4. Investigating anomalies in phishing websites
One of the main objectives of this thesis is to identify the important anomalies found in
the URLs and source codes of phishing websites. Therefore, I compiled as many
distinctive anomalies as possible. In order to gather anomalies, I realize there are two
possible ways. One way is to analyze phishing websites and the corresponding
legitimate websites together to discover their differences, but this is time consuming
process. Therefore, I selected the second way and chose past studies as the sources to
get the anomalies; since those anomalies are already confirmed that they can occur in
phishing websites. I collected several past studies, for example, studies by Chou et al.
![Page 60: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/60.jpg)
56
[2004], Fette et al. [2006], Pan and Ding [2006], Garera et al. [2007], McGrath and
Gupta [2008] , Ma et al. [2009], Bian et al. [2009], Alkhozae and Batarfi [2011] , Xiang
et al. [2011], Choi et al. [2011], and Gastellier-Prevost et al. [2011] and picked all non-
redundant anomalies. The anomalies that I have listed are mentioned next.
4.1. Anomalies found in the URLs of phishing websites
Use IP address in URLs. Some of the phishing websites use IP address in their
URLs either to replace the host name or as a substring of the URL in order to
confuse Internet users. APWG [2012] reported that 1.19%, 1.4%, and 2.09% of
the phishing websites had used URLs containing IP address during the first
quarter of 2012. An example of such URL is:
http://184.173.179.200/~agarwal/rbc/
However, some genuine web applications usually used in intranet also can
contain IP address in URL.
URLs contain brand, or domain, or host name. In this form of phishing websites’
URLs, the target’s company brand or domain or host name is included in the
path segment of URLs. McGrath and Gupta [2008] found that 50%-75% of
phishing websites’ URLs contain the targeted brand or domain or host name.
According to the report of APWG for the first quarter of 2012, it was found that
49.53%, 45.39%, and 55.42% of the phishing websites used URLs containing
targeted company’s brand, or domain, or host name in their URLs. An example
of such URL is: http://fatloss4babyboomers.com/paypal.html
However, brand or domain or host name is also used by the most of the genuine
websites in their URLs.
URLs use http in place of https, i.e., abnormal SSL certificate. Most of the
phishing websites use unsecured connection to transfer sensitive information.
Valid Secure Socket Layer (SSL) certificate is issued by authorized
organizations. The authorized organizations verify the websites before issuing
SSL certificate which means acquiring such certificate by a phishing website
makes it susceptible to detection techniques and some time even dangerous for
the respective phisher to get trace. In addition, Internet users are not good at
differentiating between secure and unsecure connections [Gastellier-Provost et
al., 2011]. Some phishing websites were reported to use either invalid or
![Page 61: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/61.jpg)
57
inconsistent to claimed identity SSL certificate, but currently it is rarely in
practices since all the recent versions of popular web browsers, such as Google
Chrome, Mozilla Firefox, and IE have detection systems for them. An example
of phishing website that uses http is: http://coachbronek.com/muz4/index.php.
However, there are some authentic websites, such as Facebook, Viadeo which
use SSL for very short time to validate the users’ credentials [Gastellier-Provost
et al., 2011].
URLs contain misspelled or derived domain name. There are various tricks used
by phishers to derive domain name that looks similar to genuine domain name
but disobey the URL naming conventions. Many times such derived domain
name is registered domain name. Some of the techniques used to generate
derive domain name for phishing websites are:
o Replace the characters of real domain name with similar looking
elements (can be Hexadecimal, Integer). An example of such URL is:
http://paypa1.com, where character ‘l’ is replaced by number one.
o Introduce a hyphen (-) in domain name. An example of such URL is:
http://www.adm-ahtuba.astranet.ru/semite.html
o Shift the characters of domain name. An example of such URL is:
http://www.paypla.com, where position of characters ‘a’ and ‘l’ are
interchanged.
However, several genuine websites have URLs that contain meaningless word
and this can complicate the detection of phishing websites’ URLs.
URLs using long host name. Phishing websites’ URLs are usually longer than
normal URLs. McGrath and Gupta [2008] found that the URLs’ lengths peak
at 22 characters for legitimate websites in the DMOZ whilst they are 67
characters for the URLs in PhishTank and 107 for the URLs in MarkMonitor.
They further found that only few URLs in DMOZ were found to be longer than
75 characters and the longest URLs found in PhishTank and MarkMonior had
length more than 150 characters. In addition, they found that phishing domains
(without TLD) have shorter length than legitimate domains. Domains’ length
(without TLD) peaks at 10 characters for the URLs in DMOZ when it peaks at 7
characters for the URLs in PhishTank and MarkMonitor. An example of such
URL is:
![Page 62: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/62.jpg)
58
http://fodamat.com/templates/fodamat/webscr/PayPal.com/webscr.php?cmd=_l
ogin-run&dispatch=5885d80a13c0db1f998ca054efbdf2c29878a435fe324eec25
11727fbf3e9efe4eb694d5cae9e96bf5176d35f4070ec44eb694d5cae9e96bf5176d
35f4070ec4
Use short URLs. Some phishing websites use URLs shortening services, such as
TinyURL [McGrath and Gupta 2008, Gastellier-Prevost et al., 2011] to shorten
their URLs which ultimately redirect to long URLs. An example of such URL is:
http://prophor.com.ar/prophor/wells/alerts.php that redirected to URL
http://specialneedssvg.org/wp/wp-
admin/import/wellsfargo/wellsfargo/wellsfargo2011/index.php
Use “//” character in URLs’ path. When URLs’ path contains “//” character, it is
suspicious and there is greater chance that it will redirect [Gastellier-Prevost et
al., 2011]. An example of such URL is:
http://bganketa.com/libraries/eBaiISAPI.dll.htm?https://signin.ebay.co.uk/ws/eB
ayISAPI.dll?SignIn
However, there are some genuine websites that satisfy the condition. An
example is the login page URL for Gmail:
https://accounts.google.com/ServiceLogin?service=mail&passive=true&rm=fals
e&continue=https://mail.google.com/mail/&ss=1&scc=1<mpl=default<mplc
ache=2
URLs use unknown or unrelated domain name. Sometime phishers use a domain
name that is either completely unknown or unrelated. An example of such URL
targeted to Paypal is: http://www.traitembal.com/backoffice/images-
backoffice/dossier/
However, it is legal to have unique domain name.
URLs use multiple Top Level Domains (TLD) within domain name. Some
phishing websites’ URLs use multiple TLDs within domain name. Such URLs
can be detected from the number of dots (.) used in URLs. It is found that
genuine URLs contain on average less than five dots (.) [Zhang et al 2007a]. An
example of phishing URL with more than five dots is:
http://paypal.com.bin.webscr.skin.a5s4d6a5sdas56d6554y65564y65564y4a56s4
d56as4d65sad4.shoppingcarblumenau.com.br/
![Page 63: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/63.jpg)
59
However, there are some legitimate websites that contain more than five dots.
An example of such URL is:
https://login.live.com/login.srf?wa=wsignin1.0&rpsnv=11&ct=1351023508&rv
er=6.1.6206.0&wp=MBI&wreply=http:%2F%2Fmail.live.com%2Fdefault.aspx
&lc=1033&id=64855&mkt=en-us&cbcxt=mai&snsc=1 the URL of login page
of “Hotmail.com”
Use encoded URLs. Use of obfuscated text, i.e., ASCII or Hex or Oct, equivalent
of readable text for URL is another technique exercised to hide the identity of
phish. Some time encoded IP address is used in the URL. Such text is less likely
to be readable and can easily deceive Internet users. An example of such URL is:
http://www.absolutewealthsystem.com/www.paypal.it_service-
security_confermation/it/Processing1.php?cmd=_Processing&dispatch=5885d8
0a13c0db1fb6947b0aeae66fdbfb2119927117e3a6f876e0fd34af4365dcbd1864c8
b4dcf443a6f60fef107b96dcbd1864c8b4dcf443a6f60fef107b96
Uses special character ‘@’ in URLs. Special character ‘@’ is used in the URL to
redirect the user to a website different from that appears within the address bar.
A ‘@’ symbol in URL disregard string on the left side of the symbol and the
actual URL is the string on the right side of the symbol [Zhang et al., 2007a]. An
example of such URL is:
http://www.amazon.com:[email protected]
URLs use different port number. Some phishing websites use port other than port
80 [Gastellier-Prevost et al., 2011]. It is found that 1.19%, 0.68%, and 0.26% of
the phishing websites did not use port 80 in January, February, and March of
2012 respectively [APWG 2012].
URLs with abnormal DNS record. Legitimate websites usually have record in
DNS record; however, phishing websites usually do not have record. In case if
they have, most of the information remains empty. Figure 24 shows the DNS
lookup result using My-Addr.com tool for the phishing URL:
http://188.138.124.133/www.paypal.com/session_id/8754445562322241489889
6521458754/index.htm#
![Page 64: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/64.jpg)
60
Figure 24: DNS record for a phish URL tested using My-Addr.com tool
However, incomplete DNS record can also be for legitimate websites whilst a
complete DNS can be for fake websites.
Life of Domain. In general, the life of phishing sites is not long. Even when they
have registered domain, it is usually a recently registered one. Phishing websites
become active immediately after registration [McGrath and Gupta, 2008; Zhang
et al., 2007a]. However, everyday many recently registered legitimate websites
are added to Internet.
Number of sensitive words in URLs. Several suggestive word-tokens are used in
phishing websites’ URLs [Garera et al., 2007]. The eight word-tokens used by
Garera et al. [2007] in their classifier are: webscr, secure, banking, ebayisapi,
account, confirm, login, and signin. An example of such URL is:
http://paypal.com.cgi.bin.webscr.cmd.login.submit.dispatch.8f9j89u54iu5l5469t
6d6sd4.boquetequalityproperties.net/pay/
Use of free web hosting. Free web hosting services are widely misused by
phishers to host their phishing websites [McGrath and Gupta, 2008]. Most of the
phishing websites use domain that is specifically registered for hosting phishing
sites or they use web hosting services which are available for free [Prakash et
al.,2010]. An example of such URL is:
http://arnodits.net/ysCntrlde/webscr_prim.php?YXJub2RpdHMubmV0NTAxN
mNmYTVjMzY4NQ==MTM0MzY3MjIyOQ
However, many other legitimate websites also use free web hosting services.
URLs popularity. Page rank depicts the relative importance of a website within a
set of websites. A higher page rank indicates that the website is more important
and mostly a legitimate website can achieve it [Garera et al., 2007; Choi et al.,
2011]. Techniques by Ma [2006], Zhang et al. [2007a], Xiang and Hong [2009],
and Huh and Kim [2011] use search engine ranking for phishing websites
detection. A screenshot of the results returned by Google for a phishing URL is
shown in Figure 25.
![Page 65: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/65.jpg)
61
Figure 25: Google search results for a phish URL
However, phishing websites can use compromised URLs which are already
popular whilst newly designed websites can have very low popularity.
Moreover, the ranking varies with the type of search engine used, shown in
Figure 23.
No credible in-neighbor search results [Bian et al., 2009]. Legitimate websites’
domain usually has inlinks from various credible websites while phishing
websites mostly do not have inlinks from legitimate websites. In fact, most of
the time phishing websites even do not have inlinks at all. This does not mean
all legitimate websites will have inlinks. Several legitimate websites may not
have inlinks at all as well. Some of methods that can be used to get the inlink
are: “link:[no space]DomainToSearch” in Google, “link:[
space]DomainToSearch” in Yahoo! and Bing ,Bing webmaster tool , and
Google webmaster tool.
URLs absence in relevant web category [Bian et al., 2009]. When the keywords of
a legitimate website are entered to Yahoo! Directory, it lists out the websites that
are relevant to provided keywords which also include the legitimate website.
However, a phishing website either does not get any results or it is absent in the
results. This again does not guarantee that all legitimate websites will have non-
zero result counts. There are several legitimate websites that were found to have
zero result counts.
Number of “Bag of words” in URLs. Frequency of strings delimited by
‘/’,’?’,’.’,’=’,’-’,’_’ can be used for phishing detection[Ma et al., 2009]. In
![Page 66: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/66.jpg)
62
general, phishing websites possess higher frequency of these symbols in their
URLs than normal websites URLs. An example of such URL is:
http://artesax.com/~citcompa/paypal_priv8_us_2012/index.htm?cmd=_login-
run&dispatch=063c19f9f888ffe32e5abeba112f5b33063c19f9f888ffe32e5abeba1
12f5b33
Domain name character composition. McGrath and Gupta [2008] found that
domain names from DMOZ resembles to relative letter frequencies of characters
in English language whilst domain names from PhishTank and MarkMonitor
have less pronounced peak at each of vowels. Likewise, they also found that
relative popularity of letters of the English language differs in legitimate and
phishing domain names. Letters ‘a’, ‘c’, and ‘e’ have significantly different
probability of appearing in English language documents or DMOZ domain
names; but they have very similar probabilities of occurrence in phishing
domain names.
URLs hosted by geographical location. The majority of phishing websites are
hosted in USA [APWG, 2012]. This might be because USA hosts the highest
number of other websites as well.
TLD triplets used in URLs. It is found that triplets correspond to TLD that are
very often used by spammers are .us, .cn, and .com [Gastellier-Provost et al.,
2011]. However, they are also widely used TLD for genuine websites.
4.2. Anomalies found in the source codes of phishing websites
Abnormal anchor URLs. Genuine websites link use an anchor to provide
navigational guidance. The URLs used in the anchor are usually from their own
domain and sometime to different domain. However, in phishing sites such
anchor URLs are mostly from different domain. It has been also found that
sometimes the anchor in phishing websites does not link to any page, for
example, AURL can be “file:///E/” or “#”.
Abnormal Server Form Handler (SFH). Security is one of the prime concerns for
organizations that do online transactions. Such organizations require credentials
for login which are generally username and password. Thus, their websites
include SFH. Legitimate websites always take actions upon the submission of
form; however, phishing websites can either contain “about:blank” or “#”.
![Page 67: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/67.jpg)
63
Moreover, legal site’s SFHs are handled by the server of the same domain. So
whenever the form is handled by any foreign domain server, it makes the
websites suspicious.
Abnormal request URLs. RURLs are the links of external objects (images,
external scripts, CSS) also called resources. W3C recommends websites to use
resources from page’s own domain and is widely followed by genuine websites.
However, spoof websites often use these resources from the victim websites to
make phishing websites look and feel similar to legitimate websites. It means
the request URLs used by crook websites are often from different domain. Some
of the genuine websites too use resources from domain other than their own
domain; however, they use for very few resources whilst in phishing websites
they use different domain for RURLs for most of their resources.
Abnormal Cookie. Cookie is used to identify users and their previous activity in
the websites. This is an important part of portals and online shopping websites.
This is always bound to websites’ server domain. However, in phishing
websites, it either points to its own domain which is inconsistent to the claimed
identity or points to genuine websites’ domain which differs from the phishing
domain.
Mismatch hyperlink. Mismatch hyperlink is used to mislead Internet users.
Although the links that appear to Internet users are of the original websites, but
when the links are clicked, they direct to the phishing websites. For instance,
<a href=”http://www.profusenet.net/checksession.php”>
https://secure.regionset.com/Ebamking/logon/</a>
Use of illegal pop-up windows. A phisher uses pop-up and asks Internet users to
fill their information. It could be a borderless window above the real websites
that looks very much a part of genuine websites. There can be two ways to
create pop-up windows: using HTML which is in practice, for instance,
< div onClick=”window.open(‘mona.html’)”>
Other way is using Javascript, which is illegal:
onClick=”javascript:popup(‘mona.html’)”
All popular web browsers have features to block pop-up windows [Alkhozae
and Batarfi, 2011].
![Page 68: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/68.jpg)
64
Harmful forms. Phishing websites usually use a form asking to fill other details
along with username and password [Ludl et al., 2007]. Number of input fields,
text fields, password fields, hidden fields, and other fields, such as radio buttons,
and check box can be used for phishing detection [Ludl et al., 2007]. Precisely,
<input> tags that accept text accompanied by word, such as “credit card” can
indicate phishing [Zhang et al., 2007a]. This form usually contains submit
button. Figure 26 shows a form in a phishing website.
Figure 26: A form phishing website
Use of onMouseOver to hide the link. Some phishing websites include
onMouseOver function to hide their abnormal link. An example of code snippet
that performs onMouseOver is below:
<a href="http://www.abc.com"onMouseOver ="window.status='Click here to go
to ABC'; return true">ABC</a>
onMouseOver="window.status='Click here to go to ABC'; return true"
Number of Script tag. In general, phishing websites are found to use more number
of Javascript tags and plain text pages than legitimate websites [Ludl et al.,
2007]. Thus, too many uses of Javascript tags in a website make it suspicious.
Presence of Javascript functions. There are some native Javascript functions, such
as escape (), eval(), link(), unescape(), exec(), link(), and search() , which occur
predominately in phishing websites containing cross-site scripting and web
based malware [Choi et al., 2011]. Availability of these functions in higher
count in a website makes it suspicious.
IFrame redirection. IFrame is used to embed another webpage within the current
webpage. It creates a frame or window on a webpage so that another page can be
![Page 69: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/69.jpg)
65
inside this frame. A borderless IFrame which can be hard for Internet users to
detect manually is found to be used by some phishing websites.
Mismatch in form fields and domain name. Phishing websites use their own
domain name but put text of legitimate websites in the <title> tag, which make it
a complete mismatch [Gastellier-Provost et al., 2011]. This can be applied for
phishing detection.
Disabled right click. Some of the phishing websites disabled the right mouse
click. A simple Javascript function can be used to disable it. A code snippet that
can disable right click is given below:
function disableclick(e){
if(event.button==1) {
return false; }}
Use authentic logo. Almost all of the phishing websites use logo of the legitimate
websites to imitate the appearance [Zhang et al., 2007a]. This verification needs
record of all the logos of legitimate websites that are highly targeted by phishers,
which means dependency.
Integrate security logo. Most of the phishing websites use security logo, such as
VeriSign [Gastellier-Prevost et al., 2011] to provide the look of genuineness. It
needs prior knowledge about all existing security logos. Figure 27 shows a
phishing website that uses “VeriSign” logo.
Figure 27: Phishing website with a company’s logo and VeriSign’s logo
Keyword/Description. These objects and properties provide information about the
websites, such as copyright, ownership, and content of the website. Although
website’s mirroring is quite simple process, even all popular browser’s (e.g.,
![Page 70: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/70.jpg)
66
“Save as” option is one of the simplest methods for website mirroring, yet this
information can be helpful for phishing detection. In fact, there are already some
phishing prevention techniques which use them for phishing detection, such as
Bayesian Filter.
Sloppiness or lack of familiarity with English. Some phishing websites bear silly
spelling mistakes, grammatical errors, and inconsistencies in the web contents.
Sometime it is done deliberately in order to bypass anti-phishing tools that use
content based filtering technique, i.e., Bayesian Filter. Moreover, designing a
tool to check language mistakes is in itself another challenge. Moreover, there
are many phishing websites that are in other languages than English.
Email function. Some of the phishing websites include a function that sends email
to the phishers. When a victim enters the information, it sends an email with all
the information to the phisher. An example of Javascript code that sends email
is:
function sendMail() {
var link = "mailto:[email protected]"
"&subject=" + escape("This is my subject")
"&body=" + escape(document.getElementById('myText').value) ;
window.location.href = link ;}
This code can be in some other programming language that cannot be shown in
client side.
4.3. Verification of the anomalies using online phishing websites
An experiment was conducted to verify the anomalies listed in afore mentioned
subchapters (i.e., 4.1. and 4.2.). I used twenty online phishing websites already
validated as phishing websites by PhishTank, for the experiment. I selected serially the
top twenty phishing websites that were verified as phishing on 9th of August 2012. The
list of phishing websites’ URLs used for the experiment is included in the Appendix. I
verified most of the anomalies, but few of the anomalies were not verified due to
technical complications. This includes anomalies that are related with the grammatical
mistakes in the web contents. I used mainly the login page of phishing websites for the
experiment, since it is the entry point and phishing has to be detected at this point. I
used the most of the tools and environments that already exist for the experiment. The
![Page 71: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/71.jpg)
67
benefit of using existing tools is that these tools are online, stable, and their results can
be trusted. The tools and environments used are:
Google search engine was used to obtain the popularity of phishing URLs. The
complete URL of each phishing website was used as a search keyword.
Google, Yahoo!, and Bing search engines were used for finding the credible in-
neighbor search of phishing websites’ URLs.
Yahoo! Directory was used to obtain relevant web category. Spoofed organization
name was used as a keyword.
DNS and WHOIS tool in My-Addr.com was used to get the DNS record of
phishing website’s URL.
Check/Search Port tool in My-Addr.com was employed to get the port used by
phishing websites.
Notepad++ was used as a source code viewer and also its ‘find’ feature was used
to search DOM objects’ tags.
Utility applications designed in C Sharp programming language (.Net platform)
were used for extraction of properties in URLs and DOM objects.
In order to verify the anomalies, I chose a phishing website at a time and looked for
all the anomalies in the website. I always started with the anomalies which require the
website to be online, e.g., URLs hosted geographical location, URLs popularity, no
credible in-neighbour search results, URLs with abnormal DNS record, URLs use
different port number, use of free web hosting, and life of domain. One of the major
challenges was that phishing websites do not remain online for a long time. Therefore, I
have to make sure I get the required information before somebody takes the website
down. Then, I download the phishing webpage for source code analysis and after that I
analyzed its URL for anomalies. I analyzed the source code of the phishing website in
the last.
During analysis, firstly, I checked whether the anomalies are present in the selected
phishing website or not. Then, I obtained the count of occurrences for those anomalies
whose count is necessary to differentiate between a legitimate website and a phishing
website, such as number of “Bag of words” in URLs, number of script tags, URLs use
multiple Top Level Domains (TLD) within domain name, and number of script tags. I
also calculated the mean and median values of the count of occurrences. Mean value is
calculated when the data set (i.e., a set of values formed from count of occurrences of an
![Page 72: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/72.jpg)
68
anomaly in each phishing websites) is evenly distributed, otherwise, median value is
calculated. The results from the experiment are listed in Table 2 and Table 3. Table 2
contains anomalies type and the number of phishing websites containing anomalies in
their URLs.
Properties Results (Occurrence/Total)
Use IP address in URLs 2/20
URLs contain brand or domain or host
name
12/20
URLs use http in place of https ,i.e.,
abnormal SSL certificate
20/20
URLs contain misspelled or derived
domain name
0/20
URLs use large host name 9 /20 URLs length equal or greater than 75
characters ; Mean =96.9
Use short URLs 2/20
Use “//” characters in URLs path 1/20
URLs use unknown or unrelated domain
name
8/20
URLs use multiple Top Level Domains
(TLD) within domain name
20/20; Mean=3
Use encoded URLs 4/20
Uses special character ‘@’ in URLs 0/20
URLs use different port number 0/20
URLs with abnormal DNS record Complete=11; Incomplete=8; Not Found=1
Number of sensitive words in URLs 9/20,
Number of “Bag of words” in URLs 20/20; Mean=9
URLs popularity 18/ 20; Median Results Count =3
No credible in-neighbour search results 20/20
URLs absence in relevant web category 20/20
Life of domain Unknown, cannot obtain the life of domain
Use of free web hosting Unknown, cannot obtain information about the
web hosting servers
![Page 73: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/73.jpg)
69
Domain name character composition Unable to classify
URLs hosted geographical location 10/20 –United State; 3/20- Spain; 1/20 each
for- France, Italy, Switzerland, Hong Kong,
Vietnam, Turkey; 1/20- Unknown
TLD triplets used in URL 11/20 use .com
Table 2: Number of phishing websites containing anomalies in their URLs
Similarly, Table 3 contains anomalies type and the number of phishing websites
containing anomalies in their source codes.
Properties Results (Occurrence/Total)
Abnormal anchor URLs 18/20
Abnormal Server Form Handler (SFH) 20/20
Abnormal request URLs 18/20
Abnormal cookie 3/20
Mismatch hyperlink 0/20
Use of illegal pop-up windows 0/20
Harmful forms 20/20
Use of onMouseOver to hide the link 0/20
Number of script tags 20/20; Mean=28
Presence of Javascript functions 10/20
IFrame redirection 0/20
Email functions 0/20
Mismatch in form fields and domain
name
19/20
Disable right click 0/20
Use authentic logo 20/20
Integrate security logo 11/20
Keyword/Description Unknown, phishes used various languages.
Sloppiness or lack of familiarity with
English
Unknown, phishes used various languages.
Table 3: Number of phishing websites containing anomalies in their source codes
![Page 74: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/74.jpg)
70
4.4. Discussion on findings
The anomalies present in source codes are clearer than those found in URLs. Most of
the anomalies in source code can be analyzed locally which means they do not need
Internet connection and they are almost independent of the Internet speed once the web
pages get loaded. Likewise, the majorities of anomalies in source codes are only textual
matching except few anomalies which need images matching and English grammar
rule. One of the major problems in analyzing anomalies in source codes is that they
need to load web pages which expose Internet users to vulnerabilities from malicious
codes, keyloggers, and botnets. Although, the risk from malicious code, keyloggers, and
botnotes can be reduced using a sandbox browser to load the webpage for analysis; it
cannot guarantee a complete protection from malwares and malicious codes [Sabanal
and Yason, 2012].
Similarly, the analysis of anomalies in URLs does not need to load the web pages
which mean Internet users can be safe from phishing conducted using malicious
software. However, some of the anomalies found in URLs need Internet connection and
are time consuming processes.
The experiment revealed that all anomalies are not equally important. Some
important results from the experiment are:
A promising set of anomalies which had high frequency and they were strong
indicator of phishing are listed in Table 4.
Anomaly types
Abnormal Server Form Handler (SFH)
Harmful forms
URLs uses http in place of https or abnormal SSL certificate
URLs contain brand or domain or host name
Abnormal anchor URLs
Abnormal request URLs
Mismatch in form fields and domain name
Table 4: Promising anomalies
Some anomalies are highly occurring and also are important for phishing
detection; however, they need prior information about the owner of the
legitimate websites and the security logo owner. List of such anomalies is in
Table 5.
![Page 75: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/75.jpg)
71
Anomaly types
Authentic logo used
Security logo integrated
Table 5: Anomalies dependent on external factors
It was also found that some of the anomalies, which are easy to avoid, are either
rarely present (Table 6) or are absent (Table 7) in phishing websites.
Anomaly types
Use IP address in URLs
Use encoded URLs
Use ‘//’ characters in URLs path
Abnormal Cookie
Use short URLs
Table 6: Important anomalies that are less occurring
Anomaly types
Uses special character ‘@’ in URLs
Mismatch hyperlink
Use of illegal pop-up windows
Use of onMouseOver to hide the link
IFrame redirection
Email functions
Disable right click
URLs contain misspelled or derived domain name
URLs use unknown or unrelated domain name
Table 7: Important anomalies absence in phishing websites
Some of the anomalies can have higher time overhead, which can make
them unsuitable during certain circumstances, for example, in the case when
Internet speeds is slow. The list of anomalies is in Table 8.
Anomaly types
URLs with abnormal DNS record
No credible in-neighbor search results
URLs absence in relevant web category
Life of domain
![Page 76: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/76.jpg)
72
Use of free web hosting
URLs hosted geographical location
URLs Popularity
URLs use different port number
Table 8: Anomalies with higher time overhead
There are some anomalies which are not clear in the sense that the same
anomalies also exist in legitimate websites. Therefore, such anomalies need
further analysis to clarify exactly when their presence can declare a website as a
phishing website. The list of such anomalies is in Table 9.
Anomaly types
URLs use multiple TLD within domain name
TLD triplets used in URL
Number of sensitive words in URLs
Number of Script tag
Number of ‘Bag of words’ in URLs
URLs use large host name
Presence of Javascript functions
Table 9: Vague anomalies (need further analysis)
Although, Zhang et al [2007a] stated that a genuine website contains less
than five dots (‘.’) in URL, i.e., anomaly “URL uses multiple TLD within domain
name”, but only three phishing websites are found during the experiment that
satisfy the condition whilst there are legitimate websites, which have login page
with more than five dots ,e.g. ,
https://login.live.com/login.srf?wa=wsignin1.0&rpsnv=11&ct=1350861003&rver
=6.1.6620.0&wp=MBI&wreply=http:%2F%2Fmail.live.com%2Fdefault.aspx&lc
=1033&id=64648&mkt=en-us&cbcxt=mai&snsc=1 , is the login page URL for
“Hotmail.com” that has seven dots.
Similarly, McGrath and Gupta mentioned that a long genuine URL can be
of length maximum seventy-five characters and in general of twenty-two
characters. But some of the phishing websites used for the experiment have URL
length less than twenty-two characters whilst there are genuine websites whose
login page URLs have length more than seventy-five characters, e.g.,
![Page 77: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/77.jpg)
73
https://accounts.google.com/ServiceLogin?service=mail&passive=true&rm=false
&conticon=https://mail.google.com/mail/&ss=1&scc=1<mpl=default<mplcac
he=2, is the login page URL of “Gmail.com”.
Similarly, TLD stated by the anomaly called “TLD triplets used in URL” are
the most common TLDs and millions of legitimate websites use them.
Likewise, for anomalies “Number of sensitive words in URL”, “Number of
Script tag”, “Number of ‘Bag of words’ in URL”, even though several websites
contain them but what number should indicate phishing is unclear and
interestingly they are also very common in legitimate websites.
Some anomalies are associated with English language, when several phishing
websites are found to be non-English. List of such anomalies is in Table 10.
Anomaly types
Sloppiness or lacks of familiarity with English
Domain name character composition
Keyword/Description
Table 10: Anomalies dependent on English language
Anomalies in phishing websites can be an effective way to detect phishing, but
there is a need for proper methods for selection, calibration, and deployment of those
anomalies. There is a need to look for anomaly or a group of anomalies that are hard for
phishers to manipulate and are unexpected in legitimate websites during examination of
suspected websites. Some important points that can be utilized during the deployment
of anomalies for heuristic methods are:
(i) Priority should be given to the anomalies which phishers cannot easily
avoid.
Elimination of these anomalies takes time, effort, and money of phishers.
Further, it makes easier to detect such phishing websites and sometimes
it makes risky for phishers that they might be traced. An example is
URLs uses http in place of https or abnormal SSL certificate.
Anomalies, which are crucial for usability and social engineering. The
removable of such anomalies can easily be noticed by Internet users and
phishers are forced to include them. An example is authentic logo used
in phishing websites.
![Page 78: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/78.jpg)
74
Anomalies that are vital part of phishing and phishers usually do not
have good alternative for them. An example is the use of abnormal
Server Form Handler (SFH)”.
(ii) Priority should be given depending on the harmfulness of anomalies.
Higher is the harmfulness of the anomalies when they are included in the
websites, more important the anomalies are. An example is the use of
abnormal Server Form Handler (SFH)”.
(iii) Priority should be given to anomalies on the basis of time taken for analysis
versus the importance of anomalies
It is important to realize the time required analyzing an anomaly and the
impact it makes in phishing detection procedures. There should not be a
time overhead. An example is checking URLs popularity that can have
time overhead when Internet is slow.
(iv) Priority should be given to independent anomalies.
Priority should be given to independent anomalies over dependent
anomalies. Some anomalies need other anomalies to make sense in
phishing detection. Examples of such anomalies are: "Harmful form" and
"URLs uses http in place of https, i.e., abnormal SSL certificate".
(v) There is a possibility that an anomaly will occur in legitimate websites other
than domain owner.
Priority should be given to anomalies that have a high possibility to
occur in legitimate websites and are against recognized standard or
practices than anomalies that can occur in legitimate website and are not
objected by recognized standard. An example of an anomaly which is
against the recognized standard is “Use of illegal pop-up windows”.
Similarly, an anomaly which is not against the recognized standard is
“Presence of Javascript functions”.
It is recommended to employ anomalies that are strong indicators of phishing in
heuristic methods; however, the irony is that most of the phishers try to get rid of those
anomalies. Therefore, heuristic methods also have to rely on those anomalies that are
not strong indicators and can be easily found in legitimate websites. In addition, many
web developers either lack information on standards, such as W3C, ISO, Ecma
International, and Google Guidelines relate to the best practices in web development or
![Page 79: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/79.jpg)
75
they deliberately do not follow these standards. Such developers unintentionally include
several anomalies in their websites which are also the characteristics of phishing
websites because of which their websites get misclassified.
One of the prime reasons for such misclassification is that current heuristic
methods that look for the anomalies in URLs and source codes of suspected website
usually look for each anomaly separately and assign a particular score to each of them.
The problem with this approach is that they penalize all websites on equal basis when
any anomaly is present. Due to it, several unimportant anomalies which also occur in
legitimate websites and improperly designed website accumulate enough score to
declare a legitimate website as a phishing website. Moreover, this is not the way human
decision making process works. The human decision making process looks to other
circumstances before making the final verdict and they are justifiable. Such decision
making should be applied for phishing detection too. A technique alike to Ludl et al.
[2007] who employed J48 algorithm to extract decision tree to classify phishing and
legitimate website can be more effective for such case. It can provide intuitive insight
into which features are important in classifying a data set.
5. Conclusions Phishing is almost a decade and half old concept emerged in mid 90s. It is also one of
the highly publicised cyber crimes since it is related to money and adversely impacts
business and general public interest. Moreover, the majority of phishing uses technically
simple method, i.e., create authentic looking forge websites and reach potential victims
through spam. Indeed, there is some phishing which employ complex techniques, such
as cross-site request forgery, cross site scripting, dynamic pharming, botnets, malicious
code, and key logger software. However, there is no countermeasure that can
outperform and protect from every kind of phishing. There are a number of studies
which have worked on technical and non-technical aspects with the objective to
determine remedies for phishing. They claim to be more effective than their
contemporaries, but, the misery is, most of them do perform well for the certain kind of
phishing and usually fail to counterattack various tricky phishing strategies. This might
be because; phishing does not just exploit technical vulnerabilities but it equally
exploits human vulnerabilities. There can be exact solutions for technical
vulnerabilities; but the exploitation of human behaviour and decision making does not
![Page 80: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/80.jpg)
76
have any precise remedy. Additionally, methods adopted by the phishers are constantly
changing. When security experts succeed to design a countermeasure for one, phishers
discover new routes to make successful attacks. One of the common mistakes that the
most of phishing prevention techniques make in general is; they depict users’ purpose
for web browsing and security significance as two different components. They inform
that something is wrong and prohibit proceeding; however, they do not provide suitable
alternatives [Ma, 2006; Wu et al., 2006b]. They usually neglect the fact that security is
not the prime concern of Internet users; and this enforces Internet users to take risks
despite warning. Further, designing phishing prevention techniques are compounded by
several issues. Most of the phishing prevention techniques fail to overcome one or many
of these issues. Some of these issues are:
Accuracy in results. The results from any phishing prevention systems should be
accurate, i.e., no false positive and no false negative results. Any errors in results
diminish the credibility of phishing prevention systems and ultimately
discourage Internet users from using them or encourage Internet users to take
risk and fall for phishing. At the same time produces a challenge for phishing
prevention systems when a website is doubtful but cannot confirm whether it is
a phishing website or not.
Effective warning. It is very important to have effective method to warn Internet
users and stop them from revealing their credentials to phishing websites. It is
one of the major challenges for anti-phishing tools. Several past studies have
proved that passive alert signals or messages are either unnoticed or ignored by
Internet users [Dhamija et al., 2006; Wu et al., 2006a; Zhang et al., 2007b]. For
active warning, i.e., refusing to connect, it should be absolutely certain else it is
unacceptable. Moreover, in the case of passive warning, the frequency of alert
message should be so that it does not miss any phish and at the same time it
should be comfortable to Internet users. Bombarding with alert messages can
force Internet users to switch off anti-phishing tools. It was also found that too
frequent alert message desensitized Internet users and they are more likely to
reveal their personal details to phishing [ITNOW, 2012].
Execution time matters. Time is an important factor in all kind of software. It
makes more sense to client side phishing prevention toolbars. Client side
phishing prevention toolbars perform the verification of webpage before loading
![Page 81: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/81.jpg)
77
it. Therefore, a slow system can highly demotivate Internet users from using it.
However, this constraint enforces to detect those anomalies that are quick to
analyse even though they might not be practically very effective to detect
phishing.
Address security and Internet users’ intentions together. Security and Internet
users’ intention cannot be dealt separately. The majority of phishing prevention
tools make mistakes by separating them. They attempt to solve the security
problem and disregard the Internet users’ specific intention. They inform that
there is something wrong, but never tells the specific ways to continue. It is
recommended integrating the security concerns into the critical path of task of
Internet users [Wu et al., 2006b] and provides them with suitable alternatives
when phishing is detected. However, it needs an extra process to determine
alternatives which affects execution time.
Scale problem. Phishing is very dynamic and phishers constantly look for ways to
bypass phishing prevention techniques. It also means that the higher the
popularity of phishing prevention technique is, phishers will apply more effort to
evade it. Therefore, phishing prevention should also have to constantly update
covering emerging trends in phishing.
Usability and Internet users’ behaviour under controlled conditions. Almost all
the studies of usability and Internet users’ behaviour are performed under
controlled condition due to ethical and legal issues. Such studies are unable to
see all factors that can influence result. However, such studies cannot be
allowed to conduct in uncontrolled condition due to privacy, ethics, and legality
issue.
Therefore, there is a need for more studies and research to develop robust technical
approaches. It equally needs some flexibility from social and legal division to freely
conduct such studies.
The current trends in phishing prevention are mostly reactive techniques. Therefore,
there is a need for proactive strategies for phishing prevention. Web development
industries need technology and practices which can make it difficult for phishers to
conduct phishing. One of the major factors that are encouraging scammers to conduct
phishing is the low cost and high benefit from phishing. When their benefits get
reduced, less and less number of people will be interested in conducting phishing.
![Page 82: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/82.jpg)
78
Awareness about security and standards in web developer is another necessary factor.
For instance, web developers should properly fill in all the different fields of source
codes with some information related to their domain name by clearly identifying every
HTML tag [Gastellier-Prevost et al., 2011]. In addition, a web developer should not use
features that are disallowed by the recognized standards, such as recommendation from
W3C and standards published by ISO. They should develop code in the way it
facilitates phishing prevention methods. Similarly, companies should follow standards
and guidelines to improve distinguishing their websites from phony websites. There is a
need of work in development of technology that can trace phishers and help law
authority to punish them. This does not mean phishing can be eliminated; however, it
can significantly be reduced.
Last but not least, non-technical methods can be a vital player in the war against
phishing. However, many of the organizations prone to phishing still do not provide
information or counselling to their new customers relating dangers from phishing unless
they are victimized. This might be because to conduct counselling it needs resources
and also there is a chance that their customers wrongly understand as the weakness of
organizations. Many organizations do include static information about phishing in their
websites which is dull for many customers and they hardly read it. Therefore, there is a
need for improvement in presentation of such information. For instance, techniques,
such as puzzle and game can be motivating and an effective way to teach customers
about phishing.
6. Limitations and future development work
In this thesis, the experiment is conducted only on phishing websites, so I believe the
results could be more accurate if the same study was conducted on legitimate websites
as well. More importantly, the results obtained are solely on the basis of meta-analysis
of past studies followed by an experiment on phishing websites. In order to observe the
clear picture of results, it is necessary to apply them in real time anti-phishing software.
Therefore, designing such software is the main future development work from this
thesis.
![Page 83: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/83.jpg)
79
References
[APGW, 2012] Phishing activity trends report: 1st half 2012. Report January-March 2012. Available as: http://www.antiphishing.org/reports/apwg_trends_report_q1_2012.pdf (retrieved on 5th May 2012)
[American Bankers Associaion, 2005] ABA works on fraud: phishing prevention and resolution. Available as: http://www.angelinabank.com/phishing063005.pdf (retrieved on 15th October 2012)
[Bing Webmaster Tools] How to submit a sitemap. Available as: http://onlinehelp.microsoft.com/en-US/bing/hh204487.aspx (retrieved on 7th July 2012)
[CallingID] CallingID toolbar. Available as: http://www.callingid.com/Default.aspx (retrieved on 17th November 2012)
[Cloudmark] Cloudmark Anti-Fraud toolbar. Available as: http://www.cloudmark.com/en/products/cloudmark-desktopone/index (retrieved on 17th November 2012)
[DNSSEC Validator] DNSSEC Validator 1.1.5. Available as: https://addons.mozilla.org/en-us/firefox/addon/dnssec-validator/ (retrieved on 18th November 2012)
[IDG News Service, May 10 2012] NASA and pentagon hacker TinKode receives two years suspended jail sentence. Available as: http://news.idg.no/cw/art.cfm?id=F21FFE88-01F3-6A5A-F13AD8F4C45D72FC (retrieved on 16th November 2012)
[EarthLink] EarthLink toolbar. Available as: http://www.earthlink.net/software/domore.faces?tab=toolbar (retrieved on 17th November 2012)
[eBay Toolbar’s Account Guard] Using eBay toolbar’s account guard. Available as: http://pages.ebay.com.au/help/account/toolbar-account-guard.html (retrieved on 28th July 2012)
[Fraud Eliminator] Fraud Eliminator toolbar. Available as: http://www.topsecretsoftware.com/fraud-eliminator.html (retrieved on 17th November 2012)
[Geo Trust] Geo Trust Trustwatcher toolbar. Available as: http://dnstree.com/com/trustwatch/ (retrieved on 17th November 2012)
[Google Safe Browsing] Google Safe Browsing API. Available as: https://developers.google.com/safe-browsing/ (retrieved on 17th November 2012)
[Google Support] Phishing and malware detection. Available as: https://support.google.com/chrome/bin/answer.py?hl=en&answer=99020&p=cpn_safe_browsing (retrieved on 31st July 2012)
![Page 84: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/84.jpg)
80
[Google Webmaster Guidelines] Best practices to help google find, crawl, and index your site. Available as: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=35769 (retrieved on 7th July 2012)
[Google Webmaster Tools] How often does Google crawl the web? Available as: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=34439 (retrieved on 7th July 2012)
[Hacker Factor Solutions, 2005] Anti-Phishing: page encoding. Available as: http://www.hackerfactor.com/papers/ap-page_encoding.pdf (retrieved on 2nd March 2012)
[IBM Internet Security Systems, 2007] The phishing guide: understanding & preventing phishing attacks. Available as: http://www-935.ibm.com/services/us/iss/pdf/phishing-guide-wp.pdf (retrieved on 2nd March 2012)
[ITNOW, 2012] Overload information, ITNOW- The Chartered Institute for IT, autumn 2012.
[MarkMonitor Inc., 2008] Whitepaper- Rock phishing: the thread and recommended countermeasures. Available as: https://www.markmonitor.com/download/wp/wp-rock-phish.pdf (retrieved on 2nd March 2012)
[MSDN IEBlog] IE8 security part III: SmartScreen filter. Available as: http://blogs.msdn.com/b/ie/archive/2008/07/02/ie8-security-part-iii-smartscreen-filter.aspx (retrieved on 22nd July 2012)
[Netcraft] Why use the Netcraft toolbar? Available as: http://toolbar.netcraft.com/ (retrieved on 23rd July 2012)
[NYDailyNews.com, July 14 2011] Pentagon hacked, 24,000 files stolen by ‘foreign intruders’ in cyber attack. Available as: http://articles.nydailynews.com/2011-07-14/news/29792364_1_cyber-attack-terrorist-group-pentagon-computer-system (retrieved on 28th July 2012)
[PhishTank] Online valid phishes. Available as: http://www.phishtank.com/phish_search.php?valid=y&active=All&Search=Search (retrieved on 9th of August 2012)
[SpoofStick] SpoofStick 1.02. Available as: https://whatapp.org/spoofstick/ (retrieved on 28th July 2012)
[SpoofGuard] SpoofGuard. Available as: http://crypto.stanford.edu/SpoofGuard/ (retrieved on 9th October 2012) [Aburrous et al., 2010] Maher Aburrous, M.A. Hossain, Keshav Dahal, and Fadi
Thabtah, Experimental case studies for investigation e-banking phishing techniques and attacks strategies. Springer Science+ Business Media, LLC 2010.
![Page 85: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/85.jpg)
81
[Alkhozae and Batarfi, 2011] Mona Ghotaish Alkhozae and Omar Abdullah Batarfi, Phishing websites detection based on phishing characteristics in the webpage source code. IJICT, Volume 1 No.6, October 2011, ISSN-2223-4985.
[Bian et al., 2009] Kaigui Bian, Jung-Min” Jerry” Park, Michael S. Hsiao, France Belanger, and Janine Hiller, Evaluation of online resources in assisting phishing detection. In: Proc. of 2009 Ninth Annual International Symposium on Applications and the Internet, Page 30-36.
[Cao et al., 2008] Ye Cao, Weili Han, and Yueran Le, Anti-phishing based on automated individual white list. ACM 978-1-60558-294-8/08/10.
[Chen et al., 2009] Kaun-ta Chen, Chun-Rong Huang, Chu-Song Chen, and Jau-Yuan Chen, Fighting phishing with discriminative keypoint features. IEEE Internet Computing, 1089-7801/09.
[Choi et al., 2011] Hyunsang Choi, Bin B. Zhu, and Heejo Lee, Detecting malicious web links and indentifying their attack types. In: Proc. of 2nd USENIX Conference on Web Application Development 2011.
[Chou et al., 2004] Neil Chou, Robert Ledesma, Yuka Teraguchi, and John C. Mitchell, Client-side defence against web-based identity theft. In: Proc. of 11th Annual Network and Distributed System Security Symposium, 2004.
[Cordero and Blain, 2006] Arel Cordero and Tamara Blain, Catching phish: Detecting phishing attacks from rendered website images. University of California, Berkeley, CA, 94720, 12th December, 2012. Also available as: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.92.9084&rep=rep1&type=pdf (retrieved on 27th July 2012).
[Dhamija et al., 2006] Rachna Dhamija, J.D.Tygar, and Marti Hearst, Why phishing works. ACM 1-59593-178-3/06/0004.
[Dhamja and Tygar, 2005] Rachna Dhamija and J.D. Tygar, The battle against phishing: Dynamic security skins. In: Proc. Symposium On Usable Privacy and Security (SOUPS) 2005, July 6-8, 2005, Pittsburgh, PA, USA.
[Dong et al., 2008] Xun Dong, John A. Clark, and Jeremy Jacob, Modeling user-interaction. IEEE 2008, 1-4244-1543-8/08.
[Downs et al., 2006] Julie S. Downs, Mandy B. Holbrook, and Lorrie Faith Cranor, Decision strategies and susceptibility to phishing. In: Proc. of Symposium On Usable Privacy and Security (SOUPS), July 12-14, 2006, Pittsburgh, PA, USA.
[Dunlop et al., 2010] Matthew Dunlop, Stephen Groat, and David Shelly, GoldPhish: using images for content-based phishing analysis. In: Proc. of Fifth International Conference on Internet Monitoring and Protection, 2010, ICIMP, pp.123-128.
[Edwards et al., 2007] W. Keith Edwards, Erika Shehan Poole, and Jennifer Stoll, Security automation considered harmful? ACM 978-1-60558-080-7/07/09.
![Page 86: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/86.jpg)
82
[Egelman et al., 2008] Serger Egleman, Lorrie Faith Cranor, and Jason Hang, You’ve been warned: An empirical study of the effectiveness of web browser phishing warning. In: Proc. of CHI 2008, April5-10, 2008, Florence, Italy. ACM 1-59593-178-3/07/0004.
[Fette et al., 2006] Ian Fette, Norman Sadeh, and Anthony Tomasic, Learning to detect phishing emails. Carnegie Mellon University, School of Computer Scienec, Technical Report CMU-CyLab-06-012. Available as: http://www.cs.cmu.edu/~tomasic/doc/2007/FetteSadehTomasicWWW2007.pdf (retrieved on 2nd May 2012).
[Florêncio and Herley, 2006] Dinei Florêncio and Cormac Herley, Analysis and improvement of anti-phishing schemes. Security and Privacy in Dynamic Environments IFIP International Federation for Information Processing Volume 201, 2006, pp 148-157.
[Friedman et al., 2002] Batya Friedman, Helen Nissenbaum, David Hurley, Daniel C. Howe, and Edward Felten, Users’ conceptions of risks and harms on the web: A comparative study. ACM 1-58113-454-1/02/0004.
[Fu et al., 2006] Anthony Y. Fu, Liu Wenyin, and Xiaotie Deng, Detecting phishing web pages with visual similarity assessment based on Earth Mover’s Distance (EMD). In: IEEE Transactions on Dependable and Secure Computing, Vol. 3, No. 4, October-December 2006.
[Garera et al., 2007] Sujata Garera, Niels Provos, Monica Chew, and Aviel D. Rubin, A framework for detection and measurement of phishing attacks. ACM 978-1-59593-886-2/07/0011.
[Gastellier-Prevost et al., 2011] Sophie Gastellier-Prevost, Gustavo Gonzalez Granadillo, and Maryline Laurent, Decisive heuristics to differentiate legitimate from phishing sites. In: Proc. of Network and Information System Security (SAR-SSI), 2011 Conference. ACM 978-1-4577-0735-3.
[Herzberg and Gbara, 2004] Amir Herzberg and Ahmad Gbara, TrustBar: protecting (even naïve ) web users from spoofing and phishing attacks. Bar Ilan University, Dept. of Computer Science. Available as: http://u.cs.biu.ac.il/~herzbea/Papers/ecommerce/spoofing.htm (retrieved on 23rd July 2012).
[Huh and Kim, 2011] Jun Ho Huh and Hyoungshick Kim, Phishing detection with popular search engines: Simple and effective. In: Proc. of Springer-Verlag Berlin Heidelberg 2011, FPS 2011, LNCS 6888, pp.194-207, 2011.
[Jagatic et al., 2007] Tom Jagatic, Nathaniel Johnson, Markus Jakobsson, and Filippo Menczer, Social phishing. ACM, Volume 50 Issue 10, October 2007, Pages 94-100.
![Page 87: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/87.jpg)
83
[Jakobsson, 2005] Markus Jakobsson, Modeling and preventing phishing attacks. In: Proc. the 9th International Conference on Financial Cryptography and Data Security, Pages 89-89.
[Karakasiliotis et al., 2007] Athanasios Karakasiliotis,Steven Furnell, and Maria Papadaki, An assessment of end-user vulnerability of phishing attacks. Journal of Information Warfare, 6 (1), 2007, pp. 17-28.
[Kittur et al., 2008] Aniket Kittur, Ed H. Chi, and Bongwon Suh, Crowdsourcing user studies with Mechanical Turk. In: Proc. CHI 2008, April 5–10, 2008, Florence, Italy. ACM 978-1-60558-011-1/08/04
[Kumaraguru et al., 2009] Ponnurangam Kumaraguru, Justin Cranshaw, Alessandro Acquisti, Lorrie Cranor, Jason Hong, Mary Ann Blair, and Theodore Pham. School of phish: A real-world evaluation of anti-phishing training. In: Proc. of 5th Symposium on Usable Privacy and Security (SOUPS ’09).
[Lam et al., 2009] Ieng-Fat Lam, Wei-Cheng Xiao, Szu-Chi Wang and Kaun-Ta Chen, Counteracting phishing page polymorphism: An image layout analysis approach. In: Proc. of ISA 2009.
[Li et al., 2007] Linfeng Li, Marko Helenius, and Eleni Berki, Phishing-resistant systems: security handling with misuse cases design. In: Proc. of SQM07, 389-404, 2007.
[Li and Helenius, 2007] Linfeng Li and Marko Helenius, Usability evaluation of anti-phishing toolbars. Journal in Computer Virology, volume 3, 163-184, DOI 10.1007/s11416-007-0050-4.
[Liu et al., 2006] Wenyin Liu, Xiaotie Deng, Guanglin Huang and Anthony Y.Fu, An anti-phishing strategy based on visual similarity assessment. In: Proc. of IEEE Internet Computing, ACM 1089-7891/06.
[Liu et al., 2011] Gang Liu, Guang Xiang,Bryan A. Pendleton, Jason I. Hong, and Wenyin Liu, Smartening the crowds: computational techniques for improving human verification to fight phishing scams. In: Proc. Symposium On Usable and Secuirty (SOUPS) 2011, July 20-22, 2011, Pittsburgh, PA, USA.
[Ludl et al., 2007] Christian Ludl, Sean Mcallister, Engin Kirda, and Christopher Kruegel, On the effectiveness of techniques to detect phishing sites. In: Proc. of DIMVA’07 Proceedings of the 4th International Conference on Detection of Intrusions and Malware, and Vulnerability. Springer-Verlag Berlin, Heidelberg 2007, ISBN: 978-3-540-73613-4 doi.
[Ma, 2006] Robert Ma, Phishing attack detection by using a reputable search engine. University of Toronto, Dept. of Electrical and Computer Engineering. Available as: http://www.eecg.toronto.edu/~lie/Courses/ECE1776-2006/Projects/Phishing2a-proposal.pdf (retrieved on 7th July 2012).
![Page 88: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/88.jpg)
84
[Ma et al., 2009] Justin Ma, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker, Beyond blacklists: Learning to detect malicious web sites from suspicious urls. In: Proc. of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1245-1254, June-July 2009.
[Martino and Perramon, 2010] Antonio San Martino and Xavier Perramon, Phishing secrets: history, effects, and countermeasures. International Journal of Network Security, Vol.11, No.3, PP.163-171, November 2010.
[McGrath and Gupta, 2008] D. Kevin McGrath and Minaxi Gupta, Behind phishing: An examination of phisher modi operandi. In: Proc. of 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats. San Francisco, California, USA: USENIX Association Berkeley, CA, USA, 2008, p. Article No.4.
[McRae and Vaughan, 2007] Craig M, McRae and Rayford B. Vaughn, Phighting the phisher: Using web bugs and honeytokens to investigate the source of phishing attacks. In: Proc. of 40th Annual Hawaii International Conference on System Sciences (HICSS ‘07) 0-7695-2755-8/07.
[Medvet et al., 2008] Eric Medvet, Engin Kirda, and Christopher Kruegel, Visual similarity-based phishing detection. ACM ISBN #978-1-60558-241-2.
[Milletary, 2006] Jason Milletary, Technical trends in phishing attacks. United States Computer Emergency Readiness Team (US-CERT), 2006. Available as http://www.us-cert.gov/reading_room/phishing_trends0511.pdf (retrieved on 2nd May 2012).
[Moore and Clayton, 2008] Tyler Moore and Richard Clayton, Evaluating the wisdom crowds in assessing phishing websites. In: Proc. of Financial Cryptography and Data Security (FC) 2008, LNCS 5143, pp. 16-30.
[Pan and Ding, 2006] Ying Pan and Xuhua Ding, Anomaly based web phishing page detection. In: Proc. of 22nd Annual Computer Security Applications Conference (ACSAC’06), Computer Society, 2006.
[Prakash et al, 2010] Pawan Prakash, Manish kumar, Rao Kompella and Minaxi Gupta, PhishNet: Predictive blacklisting to detect phishing attacks. In: Proc. of IEEE INFOCOM on Computer Communication 2010.
[Odaro and Sanders, 2011] Ugiomo S. Odaro and Benjamin G. Sanders, Social engineering: phishing for a solution. In: Proc. of IT Security for the Next Generation-European Cup 2011, Kaspersky Lab.
[Rasmussen and Aaron, 2011] Rod Rasmussen and Greg Aaron, Global phishing survey: trends and domain name use in 1H2011. APWG Report January-June 2011 .Available as: http://www.antiphishing.org/reports/APWG_GlobalPhishingSurvey_1H2011.pdf (retrieved on 3rd May 2012).
![Page 89: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/89.jpg)
85
[Sabanal and Yason, 2012] Paul Sabanal and Mark Vincent Yason, Digging deep into the flash sandboxes. ibm security systems. Available as: http://media.blackhat.com/bh-us-12/Briefings/Sabanal/BH_US_12_Sabanal_Digging_Deep_WP.pdf (retrieved on 17th November 2012)
[Sheng et al., 2007] Steve Sheng, Bryant Magnien, Ponnurangam Kumaraguru, Alessandro Acquisti, Lorrie Faith Cranor, Jason Hong, and Elizabeth Nunge, Anti-Phishing Phil: The design and evaluation of a game that teachers people not to fall for phish. In: Proc. of Symposium on Usable and Security (SOUPS) 2007, July 18-20, 2007, Pittsburgh, PA, USA.
[Singh, 2007] N.P. Singh, Online frauds in banks with phishing. Journal of Internet Banking and Commerce, August 2007, vol.12, no.2.
[Wang et al., 2011] Ge Wang, He Liu, Sebastian Becerra, Kai Wang, Serge Belongie, Hovav Shacham, and Stefan Savage, Verilogo: Proactive phishing detection via logo recognition. University of California, San Diego, Dept. of Computer Science and Engineering. Technical Report CS211-0969, US San Diego, August 2011. Available as: http://cseweb.ucsd.edu/~hovav/dist/verilogo.pdf (checked on August 2nd, 2012).
[Wenyin et al., 2005] Liu Wenyin, Guanglin Huang, Lui Xiaoyue, Zhang Min, and Xiaotie Deng, Detection of phishing webpages based on visual similarity. ACM 1-59593-051-5/05/0005.
[Whittaker et al., 2010] Colin Whittaker, Brian Ryner, and Marria Nazif, Large-scale automatic classification of phishing pages. Google Inc., Research at Google: Research Areas & Publications. Available as: http://research.google.com/pubs/pub35580.html. (retrieved on 26th July, 2012).
[Wu et al., 2006a] Min Wu, Robert C. Miller, Greg Little, Web Wallet: Preventing phishing attacks by revealing user intentions. In: Proc. of The Second Symposium on Usable Privacy and Security (SOUPS 2006). pp. 102-113 2006.
[Wu et al., 2006b] Min Wu, Robert C. Miller, and Simson L. Garfinkel, Do security toolbars actually prevent phishing attacks? ACM 1-59593-178-3/06/0004.
[Xiang and Hong, 2009] Guang Xiang and Jason I. Hong, A hybrid phish detection approach by identify discovery and keywords retrieval. ACM 978-1-60558-487-4/09/04.
[Xiang et al., 2011] Guang Xiang, Jason Hong, Carolyn P. Rose, and Lorrie Cranor, CANTINA+: A feature-rich machine learning framework for detecting phishing websites. ACM Transactions on Information and System Security (TISSEC) Volume 14 Issue 2, September 2011, Article No. 21.
![Page 90: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/90.jpg)
86
[Zhang et al., 2007a] Yue Zhang, Jason Hong and Lorrie Cranor, CANTINA: A Content-Based Approach to Detecting Phishing Web Sites. ACM 978-1-59593-654-7/07/0005.
[Zhang et al., 2007b] Yue Zhang, Serge Egelman, Lorrie Cranor, and Jason Hong, Phinding phish: Evaluating anti-phishing tools. In: Proc. of the 14th Annual Network and Distributed System Security Symposium (NDSS 2007).
Appendix Important terminology and definitions
The Anti-Phishing Working Group (APWG)
An international consortium formed to fight against phishing and on-line fraud.
Active warning
Warning that forces Internet users to notice it by interrupting their activity.
Code obfuscation
An act of converting code into the form that is difficult to understand and it is
mainly performed to protect code from reverse engineering.
Crimeware
Software designed for conducting cybercrime.
Cross-site request forgery
A malicious exploitation of a website in which the legitimate user is forced to
execute unauthorized commands.
Cross-site scripting
An attack in which malicious code is injected into the client side of legitimate
webpage.
DNS spoofing
An attack because of which a DNS server returns wrong IP addresses and diverts
traffic to another computer.
Domain name typos
An act of generating a list of misspelled and mistyped of entered domain name.
Denial of Service (DOS)
An attack on a network by flooding it with useless traffic.
DOM (Document Object Model) objects
Document Object Model is a platform- and language-neutral interface that will
allow programs and scripts to dynamically access and update the content,
structure and style of documents. [W3C]
![Page 91: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/91.jpg)
87
DMOZ A web directory. False negative A phishing website is misclassified as a legitimate website. False positive A legitimate website gets misclassified as a phishing website. Heuristic methods
A technique in which various characteristics of the websites are checked to differentiate whether it is a phishing website or not.
Malware Malicious software used to disrupt computer operation and also used to conduct phishing.
Malicious code Any code or script in software system that is intended to cause undesired effect, security breach, or damage to the system. [Wikipedia]
Man in the middle attacks An intrusion into an existing connection to intercept the exchanged data and inject false information.
MarkMonitor A company that develops Internet brand protection software and services. Mirroring of website Act of creating an exact copy of another website. Passive warning Warning that just displays the message without interrupting Internet user activity. Password harvester
Malicious software that looks for username and password information in the victims’ computer.
Pharming An attack intended to redirect a website’s traffic to a bogus website. PhishTank
An anti-phishing website. Sandbox
A security mechanism for programs from untrusted sources. Session hijacking
An exploitation of computer session in order to get an unauthorised access to information or services in a computer. [Wikipedia]
Secure Socket Layer A cryptographic protocol used for secure communication over the Internet. Spam
![Page 92: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/92.jpg)
88
Unsolicited bulk messages, usually, used for advertisement. Trojan horse A kind of malware. List of URLs for the valid phishing websites used for the experiment (Source: PhishTank)
S.N URLs Brands 1. http://agenciasck.goldenbiz.com.br/ SCK
Imperial 2. http://credit10.webobo.biz/download.php?id_menu=3441921/ Haboo
3. http://deutchland-konto.ntdll.net/img/glyph/webscr.php?cmd=_login-run&dispatch=5885d80a13c0db1f1ff80d546411d7f84f1036d8f209d3d19ebb6f4eeec8bd0eaf4a55ab8d6b037be0813c1fa7ae828caf4a55ab8d6b037be0813c1fa7ae828c
Paypal
4. http://lehoapaper.com/Paypal_Virefication/1596578fae650778e27f8ffbd70c4502/
Paypal
5. http://masterstudio.es/wp-includes/js/crop/ Paypal 6. http://ilhanpolat.com/account/id/78550375/paypal/pp/update/webscr/
6998GSQ64976W84f356Gi6Bn432/profile/webscr/pp/us/www.Paypal.com/webscr.php?cmd=_login-run&dispatch=5885d80a13c0db1f 1ff80d54 6411d7f8a8350c132bc41e0934cfc023d4e8f9e5fb78214886 cead8bcd4c1677f8e7572cfb78214886cead8bcd4c1677f8e7572c
Paypal
7. http://188.138.124.133/www.paypal.com/session_id/87544455623222414898896521454598/index.htm#
Paypal
8. http://pornographicrecordings.com/img/icons/tabs/webscr.php?cmd=_login-run&dispatch=5885d80a13c0db1f1ff80d546411d7f84f 1036d8f209d3d19ebb6f4eeec8bd0eb8fde1c0e2ec85dcf4341e5b995664adb8fde1c0e2ec85dcf4341e5b995664ad
Paypal
9. http://sreeramsolutions.com/ayyalu/images/login.php CAPITEC Bank
10. http://sreeramsolutions.com/ayyalu/images/capitec.htm CAPITEC Bank
11. http://prophor.com.ar/prophor/wells/alerts.php http://specialneedssvg.org/wp/wp-admin/import/wellsfargo/ wellsfargo/wellsfargo2011/index.php
WELLS FARGO
12. http://rrnow.findhere.org/ Time Warner Cabel
13. http://paypal.com.login.secure.md5.id.0645654032132165461321. Paypal
![Page 93: Recognition of phishing attacks utilizing anomalies in ... · Sunil Chaudhary 2nd December 2012, Tampere . iii ... An uncontrolled flood of spam emails are sent with malware in the](https://reader031.fdocuments.in/reader031/viewer/2022011910/5f8533936c5e7a009f1ea702/html5/thumbnails/93.jpg)
89
fabianpulido.com/b22668f2a2c3063efb7749ac67fef65a/ 14. http://net77-43-56-76.mclink.it/.ss/
http://78.188.234.21/.ss3/?https://bankingportal.kreissparkasse- heinsberg.de/portal/portal/StartenIPSTANDARD
Sparkasse
15. http://godknwswhy.x90x.net/ Yahoo!Mail
16. http://zulumarket.com/negocio/index.html CHASE
17. http://abnerindonesia.com/billingcenter/aol/XKklowI9292O02/ DBMECX8QgQ1BHaQQv4pYZFzemQbF/verify/Accounts/Secure_Area/aol/update.php
AOL Mail
18. http://abnerindonesia.com/billingcenter/aol/XKklowI9292O02/ DBMECX8 QgQ1BHaQQv4pYZFzemQbF/verify/Accounts /Secure_Area/aol/
AOL Mail
19. http://alex.24openstore.de/PayPal/webscr.php?cmd=_login-run&dispatch=5885d80a13c0db1f1ff80d546411d7f8a8350c132b c41e0934cfc023d4e8f9e5eb7cfbb17ec87b191acc343bb447f8e9eb7cfbb17ec87b191acc343bb447f8e9
Paypal
20. http://us.battlle.net.htm.isnyeo.info/battle_net_account.html?ref=https%3A%2F%2Fus.battle.net%2Faccount%2Fmanagement%2Findex.xml&app=bam&t=1
BATTLENET