WHOIS Misuse Study...WHOIS data, identified in a Task Force Report on WHOIS Services (GNSO, 2007)...

Click here to load reader

  • date post

    25-Aug-2020
  • Category

    Documents

  • view

    1
  • download

    0

Embed Size (px)

Transcript of WHOIS Misuse Study...WHOIS data, identified in a Task Force Report on WHOIS Services (GNSO, 2007)...

  • 1

    WHOIS Misuse Study Draft report for public comment

    26 NOVEMBER 2013

    Nektarios Leontiadis

    Nicolas Christin

    Carnegie Mellon University

  • 2

    Table of Contents Table of Contents .......................................................................................................... 2

    1. Introduction ............................................................................................................ 8

    2. Background and overview of the study ............................................................. 10

    2.1. Descriptive study ......................................................................................................11

    2.2. Experimental study ..................................................................................................12

    3. Study Samples ..................................................................................................... 13

    3.1. Selecting a survey panel ..........................................................................................13

    3.2. Creating  a  microcosm  sample  of  the  world’s  registered  gTLD  domain  names ...14

    A proportional probability microcosm .................................................................................15

    Registrant sample ..............................................................................................................15

    Registrar/Registry sample ..................................................................................................16

    4. Law Enforcement & Researchers survey ........................................................... 18

    4.1. Survey methodology and design details ................................................................18

    4.2. Analysis of responses ..............................................................................................19

    Demographics ....................................................................................................................19

    Level of expertise ...............................................................................................................21

    Attack experiences .............................................................................................................22

    Specific WHOIS misuse incidents ......................................................................................23

    4.3. Discussion ................................................................................................................25

    5. WHOIS misuse reported by Registrants ............................................................ 27

    5.1. Survey methodology and design details ................................................................27

    Methodology ......................................................................................................................27

    Survey translations ............................................................................................................28

  • 3

    Types of questions .............................................................................................................29

    5.2. Response and error rates ........................................................................................29

    5.3. Analysis of responses ..............................................................................................30

    Characteristics of the participants ......................................................................................30

    Reported WHOIS misuse ...................................................................................................32

    Adverse effects ..................................................................................................................35

    Countermeasures ..............................................................................................................36

    5.4. Discussion ................................................................................................................36

    6. Assessing Registrar/Registry anti-harvesting................................................... 38

    6.1. Survey methodology and design .............................................................................38

    6.2. Analysis of responses ..............................................................................................39

    Demographics ....................................................................................................................39

    Employed anti-harvesting techniques .................................................................................40

    Incidents of WHOIS misuse ...............................................................................................41

    Incidents of WHOIS harvesting and their effect in deploying new countermeasures ..........41

    6.3. Testing of WHOIS query rate limiting techniques ..................................................42

    6.4. Discussion ................................................................................................................45

    7. Experimental Study .............................................................................................. 47

    7.1. Registrars ..................................................................................................................48

    7.2. Domain names ..........................................................................................................49

    7.3. Registrants associated with domains .....................................................................50

    Names of Registrants.........................................................................................................50

    Email addresses ................................................................................................................50

    Physical addresses ............................................................................................................51

    Phone numbers..................................................................................................................52

  • 4

    7.4. Registering domains ................................................................................................52

    7.5. Duration of the experiment ......................................................................................53

    7.6. Breakdown of the collected instances of misuse...................................................55

    Postal address misuse .......................................................................................................55

    Email address misuse ........................................................................................................56

    Attempted malware delivery ...............................................................................................60

    Phone number misuse .......................................................................................................61

    Other types of misuse ........................................................................................................63

    7.7. Overall experiment incidents of WHOIS misuse ....................................................64

    7.8. Discussion ................................................................................................................65

    8. Comparative result analysis................................................................................ 67

    8.1. Correlation between measured and reported incidence of misuse ......................67

    8.2. Domain characteristics affecting email address misuse .......................................69

    8.3. Domain characteristics affecting phone number misuse ......................................73

    8.1. Domain characteristics affecting postal address misuse ......................................76

    9. Discussion ............................................................................................................ 77

    10. Appendix A – Law Enforcement/Researcher survey ........................................ 81

    10.1. Invitation to participate ..........................................................................................81

    10.2. Consent form ..........................................................................................................82

    10.3. Survey questions ...................................................................................................85

    11. Appendix B – Registrant survey ......................................................................... 90

    11.1. Invitation to participate ..........................................................................................90

    11.2. Consent ...................................................................................................................91

    11.3. Survey questions ...................................................................................................94

    11.4. Terms .................................................................................................................... 109

  • 5

    12. Appendix C – Registrar and Registry Survey .................................................. 115

    12.1. Invitation to Participate ........................................................................................ 115

    12.2. Consent form ........................................................................................................ 116

    12.3. Survey questions ................................................................................................. 119

    13. Bibliography ....................................................................................................... 128

  • 6

    Executive summary Does public access to WHOIS-published data lead to a measurable degree of misuse1? This

    study, sponsored by the Internet Corporation for Assigned Names and Numbers (ICANN) and

    initiated by ICANN’s   Generic   Names   Supporting   Organization (GNSO, 2010), attempts to

    answer this question, with a focus on the five most populous generic Top Level Domains

    (gTLDs). To do so, we first surveyed experts, law enforcement agents, Registrants, Registrars,

    and Registries, and collected their input on the prevalence of WHOIS misuse, thereby obtaining

    a descriptive data set. We then complemented this descriptive portion of the study with a set of

    experimental measurements of WHOIS misuse, which we obtained by registering 400 domains

    in the top five gTLDs across 16 Registrars, associating unique, synthetic WHOIS contact

    information with these domains, and monitoring incidents of misuse for a period of 6 months.

    The main finding of the descriptive study is that there is a statistically significant occurrence of

    WHOIS  misuse  affecting  Registrants’  email  addresses,  postal  addresses,  and  phone  numbers,  

    published in WHOIS when registering domains in these gTLDs. Overall, we find that 44% of

    Registrants experience one or more of these types of WHOIS misuse. Other types of WHOIS misuse are reported, but at a smaller, non-significant rate. Among those, a handful of reported

    cases appear to be highly elaborate attempts to achieve high attack impact.

    As a caveat, most findings of the descriptive study are affected by low response rates from the

    parties we surveyed. Most importantly, we are unable to draw meaningful conclusions about the

    geographical aspects of WHOIS misuse. Indeed, the great majority of survey responses

    originated from the US, even though we used a much more geographically diverse Registrant

    population sample, and tried to survey Registrants in their native language.

    The experimental study corroborates the findings of the descriptive study. In particular, it offers

    quantitative insights regarding both the extent of WHOIS misuse, and the parameters affecting

    WHOIS misuse. A limitation of the experimental study is that the impact of geographical location

    1 In this study, WHOIS misuse refers to harmful acts that exploit contact information obtained from WHOIS. Harmful acts may include generation of spam, abuse of personal data, intellectual property theft,

    loss of reputation or identity theft, loss of data, phishing and other cybercrime related exploits,

    harassment, stalking, or other activity with negative personal or economic consequences.

  • 7

    on postal address misuse could not be measured, due to the prohibitively expensive cost of

    setting up postal boxes in countries without having an actual residence there.

    Among the measurable factors analyzed by this experiment, we identify the gTLD as the sole

    statistically-significant characteristic that affects the occurrence of the associated misuse of

    phone numbers published in WHOIS. For example, the rates of WHOIS phone number misuse

    are negatively correlated with .ORG domains (less misuse), but positively with .BIZ and .INFO

    (more misuse).

    Similarly, we find that the domain price is negatively correlated with the possibility of misuse of

    email addresses published in WHOIS (i.e., experimental domains purchased at greater cost had

    less email address misuse). We also discover that .COM, .NET, and .ORG domains are

    associated with less email address misuse, while .BIZ domains are associated with more

    misuse.

    We also studied whether the composition of domain names themselves impacts the probability

    of WHOIS misuse. We find that experimental domain names representing natural person names

    appear to foster less email misuse, while for other experimental domain name categories (e.g.,

    professional, randomly-generated, etc.), WHOIS misuse probability seems independent of the

    domain name composition.

    We find that WHOIS anti-harvesting techniques, applied both at the Registry and Registrar level,

    is statistically significant in reducing the possibility of WHOIS email address misuse. Overall, we

    find that experimental WHOIS data registered with Registries/Registrars with no observable

    anti-harvesting countermeasures was twice more likely to result in unwanted emails compared

    to cases where a countermeasure was deployed. We do not offer, however, a comparative

    analysis of the effectiveness of specific anti-harvesting techniques against WHOIS misuse, as

    any differences we could observe were not statistically significant.

    Finally, we do not find other statistically significant correlations between specific Registrars used

    to register experimental domains and measured rates of WHOIS misuse.

  • 8

    1. Introduction WHOIS is an essential information service that primarily allows anyone to map domain names

    to Registrants and their contact information. There is increasing anecdotal evidence of misuse

    of the data made publicly available through the WHOIS service. For instance, some Registrants2

    have reported that their WHOIS publicly available data was used by a third-party to register a

    domain name similar to the Registrant’s, while listing contact information identical to that

    provided by the Registrant. The domain name registered with the fraudulently acquired

    Registrant information was subsequently used to impersonate the owner of the original,

    legitimate domain, for nefarious purposes. Other studies have concluded that WHOIS data

    could be used for phishing attempts (SAC028, 2008), or even for sending spam email (SAC023,

    2007).

    The purpose of this WHOIS Misuse study is to provide a quantitative and qualitative

    assessment of the types of WHOIS data misuse experienced by gTLD domain name

    Registrants, the magnitude of these misuse cases and characteristics such as anti-harvesting

    measures that may impact misuse.

    The study offers the following contributions:

    We test and validate the hypothesis that public access to WHOIS data leads to a

    measurable degree of misuse of certain kinds of gTLD domain name Registrant identity

    and contact information, via a combination of a descriptive study (surveys), and of an

    experimental study.

    We examine gTLD domain names, associated Registry and Registrar anti-harvesting

    characteristics, and their effect on WHOIS misuse.

    We describe the major types of misuse stemming from public WHOIS access to

    Registrant identity and contact data.

    We assess the effectiveness of anti-harvesting defenses against WHOIS misuse.

    We design and describe a large-scale experiment to empirically measure the type and

    extent of misuse of WHOIS information. This empirical work provides a framework for

    the design of similar future studies.

    2 See http://www.eweek.com/c/a/Security/Whois-Abuse-Still-Out-of-Control/.

  • 9

    The rest of this report is organized as follows. Section 2 provides the background of the study

    and its objectives, and section 3 characterizes the population samples we utilized for the

    different components of this study. The following sections (4, 5, and 6) discuss the descriptive

    part of the study; each section separately describes each of three surveys we conducted with

    law enforcement and researchers, with Registrants, and with Registrars and registries,

    respectively. Section 7 discusses our experimental study, and includes a detailed presentation

    of the experimental design and parameters. Section 8 provides an empirical analysis of data

    collected from both the descriptive and the empirical part of the study. Section 9 concludes with

    an overall discussion of the study outcomes.

  • 10

    2. Background and overview of the study Based on their operational agreement with ICANN (ICANN, 2013), all gTLD Registrars are

    required to collect Registrant identification and contact information that is subsequently

    published in each  Registrar’s WHOIS directory. While the original purpose of WHOIS was to

    provide the necessary information to get in contact with a Registrant for legitimate purposes

    (e.g., abuse notifications or other operational reasons), uncontrolled public access to WHOIS

    also allows the collection of the same information for nefarious purposes such as unsolicited

    email or phone calls (i.e., spam). The Generic Names Supporting Organization (GNSO), which

    is responsible for the development of gTLD domain name policies, including those pertaining to

    WHOIS data, identified in a Task Force Report on WHOIS Services (GNSO, 2007) the

    possibility of misuse of WHOIS data for phishing and identity theft, among others.

    A later study by the ICANN Security and Stability Advisory Committee (SAC023, 2007) looked

    into the potential of misuse of email information posted exclusively in WHOIS. During a three-

    month measurement study, they registered an arbitrary number of randomly chosen domain

    names, with and without the use of privacy and proxy services, and monitored the mailboxes for

    spam email. The study found evidence that the public availability of WHOIS data contributes to

    the frequency of spam email; and that protective services applied either to all WHOIS data (e.g.

    rate limiting) or to WHOIS data associated with a single domain name (e.g. privacy and proxy

    services), can deter WHOIS misuse.

    This WHOIS misuse study builds on this previous work by providing updated results, and a

    more comprehensive set of experiments. This study heavily draws upon the Terms of Reference

    for WHOIS Misuse Studies (ICANN, 2009). This work was designed and conducted in response

    to the GNSO’s decision to pursue WHOIS studies (GSNO, 2010); the goal of this study is to

    provide empirical data to help ICANN determine if there is substantial WHOIS misuse which

    warrants further action. Therefore, this study is designed to try to answer the following

    questions:

    Validate or invalidate the hypothesis that public access to gTLD WHOIS data leads to a measurable degree of misuse.

    If the hypothesis is validated, identify major types of misuses stemming from public access to gTLD WHOIS data.

    Determine which anti-harvesting measures appear to be most effective against gTLD WHOIS misuse.

  • 11

    We adopted a two-pronged approach – that is, we conducted both a descriptive study and a

    complementary, experimental study. The descriptive study aims at collecting past instances of

    misuse cases, through interviews and surveys of potential victims and Registrars/Registries. We

    also surveyed law enforcement and cybercrime researchers and agencies that deal with

    incidents of misuse, to better determine the nature and overall magnitude of WHOIS misuse.

    We complemented the descriptive study by an experimental study. The goal was to acquire

    controlled data on misuse events by setting up a representative environment attractive to those

    who could be tempted to misuse WHOIS to measure the impact of anti-harvesting measures

    that could affect the degree of misuse observed.

    2.1. Descriptive study Pursuant to the Terms of Reference, the descriptive study consists of a set of four surveys: a)

    Registrant survey, b) Registrar/Registry survey, c) Cybercrime Researchers survey, and d)

    Consumer Protection, Regulatory and Law Enforcement organizations survey. Because they

    relied on identical questionnaires, we will subsequently consider surveys c) and d) as a single

    survey.

    The goals of each of the surveys are as follows.

    A) Registrant survey. Gathered a representative sample of domain names registered in the

    top five gTLDs, and surveyed experiences of specific harmful acts attributed to WHOIS

    misuse.

    B) Registrar and Registry surveys. Surveyed Registries and Registrars associated with the

    registration of the domain name sample from survey (A), to identify WHOIS anti-

    harvesting mechanisms employed, and collect aggregate information about known

    WHOIS harvesting attacks.

    C/D) Cybercrime researchers and law enforcement surveys. These surveys intend to further

    broaden the study’s  perspective of WHOIS misuse by contacting a representative set of

    researchers and consumer protection, regulatory, and law enforcement organizations, to

    gather examples and statistics on harmful acts in general, and more specifically those

    attributed to WHOIS misuse.

    Our goal for survey A was to obtain a representative sample by randomly selecting domain

    names from the top five gTLDs, maintaining the population proportions, and generate study

    results with 95% confidence interval. Owing to the much smaller populations involved, surveys

  • 12

    B and C/D, on the other hand, are intended to provided qualitative insights rather than

    quantitative measurements.

    2.2. Experimental study The second facet of this work is an experimental study, which attempts to complement the

    observations gained from the descriptive study by gathering a controlled set of network

    measurements. The platform that is used for the measurements is a set of domain names,

    registered as part of the study across the top five gTLDs through a representative sample of

    Registrars, and associated with artificial Registrant identities. The goal is to measure the extent

    of illegal or harmful Internet activity experienced by domain name Registrants that can be

    exclusively attributed to WHOIS misuse, given that the experimental design eliminates any

    extraneous variables that may correlate (positively or negatively) with the observed misuse.

    In the surveys collected from the descriptive study, it is hard to completely eliminate external

    plausible causes for illegal or harmful Internet activity to draw conclusions on WHOIS misuse

    with certainty. For example, a Registrant might experience misuse of his/her personal phone

    number used with the registration of his domain name. However, if that same number is also

    listed in his/her Facebook profile and s/he has set poor privacy controls to protect his/her

    profile, then misuse cannot be attributed to WHOIS with certainty. On the other hand, in the

    experimental study, Registrant identities (a term defined in Section 7.3) are artificially

    constructed and solely used for the purpose of this experiment.

    The experimental study lasted six months, during which we collected emails, voicemails, and

    postal mail received by the Registrants associated with the experimental domains. We

    registered 400 domains with a geographically diverse set of 16 Registrars, distributed

    proportionally across the top 5 gTLDs, with domain names that are classified in four categories

    of interest plus one control category. Our analysis provides insights into the different degrees of

    correlation between WHOIS misuse and gTLDs, types of misuse, types of domains, cost of

    domains, and anti-harvesting techniques deployed. However the experimental design did not

    allow us to gain major insights on how regions and countries are affected by WHOIS misuse; in

    particular, we were not able to set up postal boxes out of the United States, due to mail

    regulations   requiring   proof   of   residency,   in  most   countries,   and   “virtual   office”   solutions   being  

    prohibitively expensive at the scale at which we needed to run the experiment.

  • 13

    3. Study Samples In this section we discuss how we created domain name samples and selected invitees for the

    different parts of the study. We first describe how we chose the invitees for the researcher and

    law enforcement survey, before presenting the sampling process of the domain names and

    resulting invitees of the Registrar/Registry and Registrant surveys.

    3.1. Selecting a survey panel As part of the Law Enforcement and Researchers survey, we assembled a geographically

    diverse group of experts in the fields of security and privacy affiliated with research institutes,

    academia, law enforcement agencies, Internet Service Providers (ISPs), and national data

    protection commissions. The goal was to survey experts to whom WHOIS misuse incidents are

    reported, to ultimately obtain a qualitative global overview of WHOIS misuse, rather than a mere

    collection of individual misuse incidents.

    Geographical region Type of expertise North America Agencies to which security incidents are reported South America Large commercial vendor research labs Europe Large Internet service providers Africa Academic cybercrime research organizations Asia / Pacific Law enforcement agencies Commercial cybercrime investigators National Data Protection Commissioners

    Table 1 Recruiting requirement in terms of geographical region and type of expertise

    Our approach for recruiting participants was to build upon contacts established at Carnegie

    Mellon University (CMU) with additional input from ICANN to fill coverage gaps. Once this

    invitee list was completed, we identified remaining gaps and omissions in terms of the type of

    expertise we were looking for and geographic coverage, and we successfully managed to

    amend these deficiencies by researching online for additional invitees that would match our

    requirements. Table 1 lists the coverage goals for this  survey’s participants.

    Toward the end of the time interval over which the survey was initially conducted, and despite

    the high response rate (email-based invitation, 25% response rate, corresponding to 29

    responses out of 114 invitations at the time), an initial analysis of the responses informed us

  • 14

    that we had collected a small number of individual misuse incidents and that we were lacking

    coverage for South America. We therefore extended the duration of the survey and invited a

    broader population of law enforcement experts attending the Costa Rica ICANN meeting to

    participate. The required level of expertise of the additional participants was verified by survey

    questions specifically structured for that purpose. Ultimately, the survey was run between

    September 2011 and April 2012, with answers provided by every eligible3 participant who

    completed the study being included in survey results.

    3.2. Creating a microcosm sample of   the   world’s   registered  gTLD domain names

    Domain name registrations in the top 5 generic Top Level Domains (gTLDs) in the summer of

    2011 exceeded 127 million (Table 2). As we aspired to draw conclusions on characteristics of

    the gTLD population as a whole, we decided to take a representative sample of those domains

    – a microcosm – and employ statistical inference techniques on that microcosm. A similar

    technique was employed by the NORC study of the Accuracy of WHOIS Registrant Contact Information (NORC, 2010), with the exception that this WHOIS misuse study did not attempt to geographically stratify the sample. The microcosm was selected randomly in an unbiased

    proportional way from the population of 127 million.

    gTLD Domains Proportion COM 95,185,529 74.54% NET 14,078,829 11.03% ORG 9,021,350 7.06% INFO 7,486,088 5.86% BIZ 2,127,857 1.67% TOTAL 127,694,306 100%

    Table 2 Number of domain registrations in the top 5 gTLDs in August 2011

    We select such a microcosm to investigate WHOIS misuse from a number of perspectives. At

    the most basic level, we surveyed Registrants to learn about their experience of misuse of

    personal or corporate information listed in WHOIS. We then surveyed the top 5 gTLD Registries

    3 The eligibility was dependent on the participant being at least 18 years old, and on their explicit  consent  to  participate.  These  criteria  are  defined  by  CMU’s  Institutional Review Board (IRB).

  • 15

    and the Registrars associated with the sampled domains to understand how they are protecting

    the Registrants’  information  from  WHOIS misuse. Finally, using a subset of the aforementioned

    Registrars, we registered 400 test domains using artificial Registrant information, and we

    monitored instances of WHOIS misuse experienced by those domains for six months. This

    experiment enabled us to correlate domain name and directly-associated or observable

    Registrar/Registry characteristics with WHOIS misuse (e.g, gTLD, cost, anti-harvesting).

    A proportional probability microcosm In November of 2011 we received from ICANN, at our request, a sample of 6,000 domains,

    selected randomly from gTLD zone files with equal probability of selection.4 Of those 6,000

    domains, 83 were not within the top 5 gTLDs to be studied and so were discarded. Additionally,

    we were provided with the WHOIS records associated with 5,921 of the domains, obtained over

    a period of 18 hours on the day following domain sample generation. We used a WHOIS record

    parser internally developed at CMU to convert the loosely formatted WHOIS records into

    structured information that allowed further automated processing.

    With this set of structured WHOIS information, we created a proportional probability microcosm

    of the 127 million domains, using the proportions in Table 2. In deciding the size of the

    microcosm we used as a baseline the size of the microcosm in (NORC, 2010). In 2009 NORC

    assembled a proportional probability sample of 2,400 domains. Taking into account the growth

    in the population of domain names under the 5 gTLDs from 2009 to 2011, we created a

    proportional probability microcosm of 2,905 domain names, which we used to draw a sample of

    domain names for data collection.

    Registrant sample For the purpose of surveying domain Registrants, we needed a representative sample of the

    microcosm of domain names, to identify their Registrants and invite them to participate. Our

    sample design parameters are listed in Table 3. As an equal probability sample, every domain

    in the microcosm has an equal probability of being selected. As with similar studies, we adopted

    a confidence interval (CI) of 95% and margin of error (ME) of 5%. With the microcosm of 2,905

    domains we estimated that a sample size of 340 Registrants5 would provide the necessary

    4 At one point, we considered duration of registration as a sample parameter, but eventually decided not to use it, due to the relative difficulty to properly assess this parameter.

    5 𝑆𝑎𝑚𝑝𝑙𝑒 ≥  𝑤ℎ𝑒𝑟𝑒  𝑁 = 2905, 𝑛 = . × , 𝑆𝐷 =   𝑝(1 − 𝑝), 𝑎𝑛𝑑  𝑝 = 0.5.

  • 16

    insights for the given CI and ME. Additionally, provided that survey participants would be invited

    via an email invitation, we projected a 15%-25% response rate. We consequently drew a

    sample of 1,619 domains from the microcosm, which, with a 21% response rate, would yield the

    desired 340 Registrants. This sample did not explicitly exclude or include Proxy-registered

    domain names.

    Method of selection Simple Random Sampling Confidence interval 95% Margin of error 5% Expected response rate 15%-25% Table 3 Sample design parameters

    Registrar/Registry sample Before we provide the details about this sample, we need to clearly define the distinction

    between Registrars and Registries. Registrars are entities that process individual domain name

    registration requests. Each Registrar operates under agreement with at least one Registry – that

    is, an organization responsible for maintaining an authoritative list of all domain names

    registered in a given gTLD. For example, VeriSign is the Registry for all domain names

    registered in the .COM gTLD; individual Registrars such as GoDaddy and Network Solutions

    register .COM domain names under an agreement with VeriSign.

    ICANN-accredited gTLD Registrars are responsible for collecting WHOIS information during

    domain name registration, but WHOIS data storage and access varies across Registries. Thick WHOIS Registries maintain a central database of all WHOIS information associated with registered domain names; they can respond directly to WHOIS queries with all available WHOIS

    information. Thin WHOIS Registries maintain only basic WHOIS information centrally; they rely on the Registrar for each domain name to store and supply all other available WHOIS

    information.

    In this study, we were concerned with the .BIZ, .INFO, and .ORG gTLD thick WHOIS Registries

    and the .COM and .NET thin WHOIS Registries. Per  the  GNSO’s  request  for  this  study,  we  did  

    not attempt to study domain names registered under other smaller gTLDs or under ccTLDs.

    The sample of Registrars and Registries that we surveyed as part of the Registrant and Registry

    (R/R) survey, is directly associated with the previously described sample of Registrants. We

    build a sample of 111 Registrars and Registries by simply looking up the Registrars who

  • 17

    maintain the registration information of the 1619 sampled domains, and the associated

    Registries.

    In the case of Registrar affiliates operating as resellers, the association between a domain

    name and the Registrar that actually performed its registration cannot be identified in a

    straightforward way. That is because WHOIS does not hold information about the Registrar-

    Reseller relationship. So, for domains associated with known resellers, we used information in

    WHOIS on  domains’  name  servers to identify some of the Registrars. This approach is based

    on the assumption that in many cases domains use the DNS services of the Registrars with

    which they are registered. We acknowledge that the method we described is problematic in

    cases when (a) a domain has been registered with Registrar A, but the associated DNS server

    is hosted by Registrar B, and (b) the Registrant delegates its domain  name’s  DNS services to a

    company C that is not evidently associated with Registrar A. Nevertheless, we believe our

    design choice provides a systematic and reproducible method of acquiring the required

    information.

  • 18

    4. Law Enforcement & Researchers survey We ran an expert survey to gather examples and statistics on illegal or harmful Internet acts (as

    defined by ICANN through the Terms of Reference for this and other WHOIS studies) in general,

    and more specifically those attributed to WHOIS misuse, and to broaden our perspective of

    WHOIS misuse. Survey invitees included a diverse set of researchers and consumer protection,

    regulatory, and law enforcement organizations.

    4.1. Survey methodology and design details For the invitation process we built up on contacts established at Carnegie Mellon University and

    we  requested  ICANN’s  input  in  finalizing  the  list  of  parties  invited to participate in the survey. We

    made significant effort to build a geographically diverse set of experts that enabled us to capture

    the impact and the extent of WHOIS misuse around the world. We were also able to achieve

    diversity in terms of the types of the expertise of survey participants. (See Section 3.1 for a

    description of invitee list.)

    We used email messages to invite individual experts to participate in the survey. The invitation

    contained a short description of the study, information about the principal investigator, and links

    to either participate in the survey or opt out from any future messages and reminders from us.

    We also offered the option to download the questionnaire and email the responses to us. The

    content of the invitation is available in Appendix A – Law Enforcement/Researcher survey:

    Invitation to participate.

    When a participant clicks on the link to participate he is presented with a consent form that

    describes briefly the procedures, requirements, risks, benefits, associated compensation (none),

    and privacy assurances we offered. The text is available in Appendix A – Law

    Enforcement/Researcher survey: Consent form.

    The survey lasted 8 months – from August 2011 until May 2012 – and collected responses from

    101 participants. The survey was implemented with SurveyMonkey and all connections to this

    service were protected with SSL.6 The survey questions are available in Appendix A – Law

    6 Using SSL is just one of the measures we took to preserve the confidentiality of responses. In addition, only authorized personnel (researchers on our team) handled the survey responses. At the completion of the study all responses were removed from SurveyMonkey and kept at a secure location at Carnegie Mellon.

  • 19

    Enforcement/Researcher survey: Survey questions. Invitees were assured that all responses

    would be treated as confidential, with survey data published in only in aggregate, anonymized

    form.

    4.2. Analysis of responses In the following sections we first describe the demographics of the participants, which establish

    their level of expertise and geographical diversity, and then we delve into the WHOIS misuse-

    specific responses. We then provide an overall summary of our findings from this survey.

    Demographics The participants were initially asked to self-classify their occupation (Figure 1) and the type of

    employer they are working for (Figure 2). As expected, security researchers and

    government/law enforcement agents constituted about 90% of the responses. Based on the

    description of the   respondents’   employers,   it   is   evident   that   the government view is over-

    represented in responses. However, assuming that government agencies have a more

    extensive and clear awareness of the misuse incidents, this characteristic of our population

    sample is an acceptable bias.

    Figure 1 Occupation of participants.

    SecurityConsultant

    Researcher(Industry)

    Lawenforcement

    agent

    Researcher(Academia)

    Governmentagency Other Manager

    Consumerprotection

    agencyOccupation 25% 20% 20% 12% 10% 7% 5% 1%

    0%

    5%

    10%

    15%

    20%

    25%

    30%

    % o

    f par

    ticip

    ants

  • 20

    Figure 2 Description of employer.

    In terms of geographical coverage, the respondents mainly provided responses for the

    American and the European continent (Figure 3). While we made significant effort to invite

    experts in the Asia, Africa, and the Pacific regions, participation from these regions was limited.

    Figure 3 Reporting regions

    Governmentalorganization

    Securityindustry Academia

    Other ITindustry

    Not-for-profitNGO Other

    Employer 32% 23% 14% 14% 12% 5%

    0%

    5%

    10%

    15%

    20%

    25%

    30%

    35%%

    of p

    artic

    ipan

    ts

    NorthAmerica

    SouthAmerica Europe

    CentralAmerica Africa Asia Oceania

    Reporting region 37% 32% 18% 6% 4% 1% 1%

    0%

    5%

    10%

    15%

    20%

    25%

    30%

    35%

    40%

    % o

    f par

    ticip

    ants

  • 21

    Level of expertise In the survey we included a set of questions that would inform us about the level and type of

    expertise of the participants in the subject we are studying. Therefore we used a Likert scale (1:

    low – 5: high) to rate the participants’  familiarity  with  the  domain  name  registration  process,  the  

    requirement to provide personal information during that process, and the existence of the

    WHOIS directory that makes this personal information available to the public, based on self-

    reporting.

    The results (Table 4) show that the majority of respondents are cognizant of the domain

    registration process (mean:4.1, std.dev: 2.03), the requirement to submit personal information

    (mean: 4.23, std.dev: 2.06), and almost 60% of participants rated themselves as experts in the

    specifics of the WHOIS directory (mean: 4.35, std.dev: 2.1).

    We  also  included  questions  that  would  not  only  evaluate  the  participants’ understanding of two

    domain-specific notions (WHOIS harvesting, WHOIS anti-harvesting techniques), but would also

    provide us with an insight of the level of expert awareness about WHOIS misuse, and the

    techniques to thwart it.

    Table 4 Familiarity with key domain registration concepts

    1 - Notfamiliar 2

    3 - Knowthe basics 4 5 - Expert

    Domain registration process 2% 0% 23% 33% 41%Requirement to supply contact

    information with domainregistration

    1% 1% 16% 35% 46%

    Availability of contactinformation on WHOIS

    directory3% 1% 10% 30% 56%

    0%

    10%

    20%

    30%

    40%

    50%

    60%

  • 22

    81% of participants stated awareness of WHOIS harvesting, and 63% of WHOIS anti-harvesting

    techniques. When the participants were asked to describe some anti-harvesting techniques,

    most of them mentioned CAPTCHAs, port 43 rate limiting, and privacy or proxy registration

    services.

    Attack experiences In this section of the survey we sought to collect information related to direct and indirect

    (reported) experiences of security related attacks overall, before we considered the role of

    WHOIS misuse. The combined measures show the prevalent types of attacks that Internet

    users are faced with in general. Further on, we tried to look for relationships (if any) between

    reported security incidents and WHOIS misuse.

    Table 5 and Table 6 list a variety of types of security incidents that can be triggered by network

    attacks; participants are asked to note the ones that they have directly (Table 5) and indirectly

    (Table 6) observed. Not surprisingly, email spam is the most observed type of network attack in

    both cases. It is noteworthy though that all types of attacks (e.g., postal spam and blackmail)

    have a high rate of occurrence. Comparing the directly observed and reported security incidents

    we see a lower rate of reporting of email spam, email viruses, and postal spam. This could be

    attributed to the widespread nature of these types of attacks, which could make the reporting of

    these security incidents deemed unnecessary.

    Table 5 Directly observed network attack experiences (overall, not specifically related to WHOIS misuse)

    Email spam Email virusMalware

    installation/drive by

    downloadsPhishing

    Unauthorizedintrusion on

    serversPostal spam Denial ofService

    Abuse ofpersonal data

    or identitytheft

    Blackmail/ransom

    demands/intimidation

    Haveexperiencedattacks, butprefer not to

    divulgespecifics

    Vishing(voicemailphishing)

    Yes 97% 82% 78% 77% 58% 55% 54% 49% 36% 26% 20%No 3% 18% 22% 23% 42% 45% 46% 51% 64% 74% 80%

    0%10%20%30%40%50%60%70%80%90%

    100%

    % o

    f par

    ticip

    ants

  • 23

    Table 6 Security Incidents reported to the expert (overall, not specifically related to WHOIS misuse).

    Only 40% of the respondents reported that they consider the possible contribution of WHOIS misuse when analyzing security incidents. Such an observation has two possible interpretations (or a combination of interpretations); either misuse of WHOIS data is an attack

    vector that is being underestimated by the security experts and, thus, is not considered as

    valuable aspect to analyze, or that WHOIS misuse is found to be insignificant in examining

    security incidents. However, in a few cases, the experts reported that they were able to trace

    back an attack to the public availability of WHOIS information, as described next.

    Specific WHOIS misuse incidents In Figure 4 we show that 18 respondents (18%) were able to provide details in relation to 23

    individual incidents involving suspected harvesting of WHOIS information.7 The experts directly

    experienced about half (45%) of those incidents, as they were the targets of the misuse. In most

    of the cases, the effect of the misuse was the reception of electronic and postal spam mail

    containing marketing materials or bills for services that were not requested. However, a few of

    those incidents (4) show highly sophisticated planning to extract money, distribute malware, and,

    in one case, to poison DNS servers by deploying a phishing attack using WHOIS information. In

    another case, Registrant information was used to register numerous domains for illegal

    purposes.

    7 The nature of the survey (expert survey) does not allow us to extrapolate this rate of WHOIS misuse occurrence, and it is merely an illustration of the kinds of misuse of WHOIS reported on a global scale.

    Email spam PhishingMalware

    installation/drive by

    downloadsEmail virus

    Abuse ofpersonal data

    or identitytheft

    Unauthorizedintrusion on

    serversDenial ofService

    Blackmail/ransom

    demands/intimidation

    Postal spamVishing

    (voicemailphishing)

    Haveexperiencedattacks, butprefer not to

    divulgespecifics

    Yes 74% 69% 67% 67% 67% 62% 58% 50% 47% 39% 30%No 26% 31% 33% 33% 33% 38% 42% 50% 53% 61% 70%

    0%10%20%30%40%50%60%70%80%90%

    100%

    % o

    f par

    ticip

    ants

  • 24

    Figure 4 Portion of survey respondents, reporting at least one incident of WHOIS misuse.

    The types of personal information reportedly misused were mainly the email address (16 cases,

    or 70% of all 23 cases of misuse). However, there were many instances where Registrant name

    (6 cases, 26% of all 23 cases of misuse), postal address (6 cases, 27% of all 23 cases of

    misuse), and phone number (4 cases, 17% of all 23 cases of misuse) were misused as well,

    either individually or in combination with other personal details. Figure 5 summarizes these

    findings.

    .

    Figure 5 Breakdown of reported cased of WHOIS misuse, based on the type of personal information misused. Certain cases of misuse involved more than one type of information being misused, hence the total is greater than 100%.

    18%

    82%

    0%10%20%30%40%50%60%70%80%90%

    Respondents reportingexperience of WHOIS misuse

    incidents

    Respondents NOT reportingexperience of WHOIS misuse

    incidents

    % o

    f par

    ticip

    ants

    Emailaddress

    Registrantname

    Postaladdress

    Phonenumber

    Misused information 70% 26% 26% 17%

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    % of cases involving

    specific misuse

  • 25

    Note that the percentages in Figure 5 correspond to the fraction of misuse cases; but recall that

    only 18% of our respondents experienced any form of misuse at all. Furthermore, certain cases

    involved multiple types of information being misused – and thus the percentages add to more

    than 100%.

    In 11 (48%) of the reported WHOIS misuse cases, experts reported taking no action to mitigate

    the misuse (either the effects of it, or a future reoccurrence). However in 11 out of the 12

    remaining cases where anti-harvesting techniques were subsequently employed, WHOIS

    misuse incidents were eradicated. A few examples of such techniques include CAPTCHA

    challenges and IP blocking, and one less technical mechanism where the legal department of

    the affected company identified the WHOIS harvesters and demanded that they destroy the

    misused WHOIS data.

    4.3. Discussion We surveyed law enforcement and security research experts to comprehend the extent of

    misuse of the publicly available WHOIS information globally. We succeeded in having a

    geographically diverse sample with different types of expertise providing us with their insights on

    WHOIS misuse. However, as this is an expert survey with a limited population sample, we do

    not achieve statistical significance in our findings. (Note that this was not a goal, due to the

    inherent nature of an expert survey.)

    Overall, we found that, according to experts participating in this survey, WHOIS data misuse is

    generally not considered when investigating security incidents, possibly because it is

    underestimated as an attack vector. It is also noteworthy that contrary to the wide net we cast in

    this survey, we were able to collect only a moderate-sized list of WHOIS misuse incidents from

    organizations that should have an extensive understanding of the matter. This could mean that

    WHOIS misuse is either under-reported or not as prevalent as conjectured. The other parts of

    this study attempt to provide a more definitive answer to this question.

    We collected reports from a minority of the respondents that they had directly observed WHOIS

    misuse incidents. The effects of these incidents range from simple spam, to a well-orchestrated

    phishing attack with the purpose of DNS-poisoning. Additionally, the countermeasures deployed

    in those cases (mainly CAPTCHA and IP blocking) were adequate in preventing future WHOIS

  • 26

    misuse incidents. Again, other parts of this study explore anti-harvesting measures more

    empirically.

  • 27

    5. WHOIS misuse reported by Registrants We surveyed a representative sample of top 5 gTLD domain name Registrants described in

    Section 3.2 to gain a better understanding of their direct experiences with WHOIS misuse. In the

    following sections we will first discuss the methodology and design details of the Registrant

    survey. Later, we describe issues presented during the survey, which affected the

    representativeness of our findings. We then present our discoveries related to the ways

    Registrants experience misuse of their personal information as a consequence of its public

    availability in WHOIS.

    5.1. Survey methodology and design details

    Methodology We used email messages to invite Registrants to participate in the survey. We acquired the

    contact information through the WHOIS entries associated with the domains in our sample. The

    invitation contained a short description of the study, information about the principal investigator,

    and links to either participate in the survey or opt out from any future messages and reminders

    from us. Because this survey was designed to be taken by non-Internet-savvy Registrants, the

    invitation briefly described domain registration and the role of WHOIS data in simplified

    language, included the name of the sampled domain name included in our survey, and

    suggested that invitees query that domain name to see data about them published in WHOIS.

    We also offered the option to download the questionnaire and email the responses to us. The

    content of the invitation is available in Appendix B – Registrant survey: Invitation to participate.

    When participants clicked on the link to participate they were presented with a consent form that

    describes briefly the procedures, requirements, risks, benefits, associated compensation (entry

    into a random prize drawing), and privacy assurances we offered. The text is available in

    Appendix B – Registrant survey: Consent .

    Between May 2012 and August 2012 we ran two pilots of the survey, which guided us in making

    adjustments that increased the observed response rate. The actual survey lasted three and a

    half months, from September 2012 until December 2012. The invitations were sent out in stages,

    and each group of invitees was offered a period of 5 weeks to complete the survey. We also

    scheduled the distribution of weekly reminders to non-respondents that increased the response

    rate. The survey was implemented with SurveyMonkey and all connections to the service were

  • 28

    protected with SSL.8 Invitees were assured that all responses would be treated as confidential,

    with survey data published in only aggregate, anonymized form.

    Survey translations Because potential for WHOIS misuse is not restricted to English-speaking countries and this

    survey was targeted at typical Internet users across the world, we developed translations of our

    survey. We relied on native speakers of various languages from CMU for the translations. Our

    translators all had a background in computer network or computer security, which meant they

    not only had the required technical background to produce meaningful translations, but they

    were also able to integrate nuances of the different cultures, making the international invitee

    more likely to understand the survey materials and therefore more willing to participate.

    Our sample of 1619 domain name Registrants covers 81 countries, which would have required

    a disproportionate effort to translate the survey in some languages that would be mapped to a

    handful of participants. In addition, the expected low response rate of the survey (15%) was a

    good indicator that a number of translations would not be necessary, as the expected number of

    responses for certain languages was close to zero, regardless of the language used. We

    observed that 90% of our sample was located in just 18 countries, with the other 10% spread

    across 63 countries. Hence, we decided to provide translations for the top 90% of the

    participants (which includes English), and offer the English version of the survey to the other

    10%. We offered the survey in the following languages: English, Chinese, French, Japanese,

    Spanish, Italian, and Portuguese. We also intended to have German and Turkish translations,

    but were not able to secure proper translations and ended up offering the English version of the

    survey to participants from those two countries. This effectively reduced the portion of

    participants surveyed in their expected native language to 84.9%.

    As the expected response rate for the 10% of the invitees that belong to one of 63 countries is

    close to zero, regardless of the language used in the survey, we do not expect that not providing

    translations for this portion affected the outcome of the survey. Invitees from Germany and

    Turkey represent 5% of the sample. Considering the expected response rate, and assuming

    that none of the invitees from those counties have knowledge of English (which is certainly an

    extremely conservative assumption), we estimate that the upper bound of the misrepresented

    population is only 0.7%.

    8 See footnote 6.

  • 29

    Types of questions The survey is divided into three parts. The first set of questions was designed to collect data on

    the demographics of the participants. The second part of the survey was associated with seven

    different types of misuse of WHOIS: postal spam, email spam, voice spam, identity theft,

    unauthorized intrusion to servers, denial of service, Internet blackmailing, or any other type of

    misuse a Registrant may have experienced. We requested that the participants optionally

    provide a detailed description of their experiences in any of the previous categories. Due to the

    length of the survey, which could take up to 30 minutes to complete, and could therefore lead to

    participants abandoning the survey before completion, we randomized the sequence of

    questions for different types of misuse, in an effort to avoid biases related to the design of the

    survey. The third and final part of the survey collected information related to actions taken by

    the participants in response to the WHOIS misuse. The survey questions are available in

    Appendix B – Registrant survey: Survey questions. Through an online glossary, we also offered

    definitions for key terms used in the survey questions, to accommodate typical Internet user

    participants not familiar with the technical DNS and cybersecurity jargon. The terms are

    available in Appendix B – Registrant survey: Terms.

    5.2. Response and error rates Between May and August of 2012, we ran two pilots of the Registrant survey to assess possible

    issues with the design and/or implementation of the survey. One pilot involved tech-savvy

    colleagues at CMU with great experience in user surveys. This pilot helped us identify and fix a

    number of design issues. The second pilot was targeted to a broader audience of randomly-

    selected English speaking Registrants, and was intended to assess the expected response rate.

    As shown in Table 3, we expected a response rate of 15%. However, in this second pilot, we did

    not receive any responses out of the 48 invitations sent. We identified as a possible problem the excessive length of the survey, which apparently discouraged participation. Therefore, we

    attempted to remedy this by offering entry into a random prize drawing9 to participants that

    would complete the survey in its entirety. Note there was no incentive to report having

    encountered misuse; respondents were only required to complete survey sections that

    pertained to their experiences.

    9 The prizes were one Apple iPad 3 and four Apple iPod Shuffles, selected by random drawing among all participants who completed a survey.

  • 30

    Overall, we sent out 1619 invitations and had 57 participants: 52 in English, 3 in Japanese, and

    2 in Spanish, achieving a response rate of 3.6%. Out of these 57 participants, we had 41

    complete responses. Such a low number in collected responses impacts our targeted levels of

    significance, namely the error rate. The resulting error rate for the statistic we are measuring (is there observed WHOIS misuse?) is 12.7%. This means that for 95% of the population, the measured misuse deviates from the actual misuse in 12.7% of Registrants. For the other 5% of

    the population, the deviation of the measured misuse can deviate by more than 12.7% of the

    actual value (i.e. far more or far less misuse).

    We should point out that inviting more Registrants was not expected to help us reach the goal of

    5% error rate. If we were to invite every one of the 2,905 Registrants in the Registrant

    microcosm, with an observed response rate of 3.6%, we would collect 105 responses. This

    number of responses would result in a 9.4% error rate. This lower error rate would be

    associated with a higher cost of running the survey, due to additional translations required.

    5.3. Analysis of responses We start the analysis of the collected responses by first giving an overview of the characteristics

    of the sample in terms of the demographics as well as the knowledge reported about the

    WHOIS directory. We then delve into the specific types of WHOIS misuse reported.

    Characteristics of the participants From a demographic standpoint, the participants are mainly from English speaking countries

    (92%) even though we made efforts – as previously discussed – to include a wide geographical

    range of participants. We collected responses from the following countries (in descending order

    of number of participants): USA, Japan, United Arab Emirates, Australia, Canada, Switzerland,

    Germany, Spain, UK, India, and Mexico (Figure 6). There were also respondents that did not

    disclose their location.

  • 31

    Figure 6 Reported origin of participants.

    Although each Registrant was surveyed just once, in regards to a single sampled domain name,

    the majority of the participants (60%) have more than 10 domains registered, with 9% of the

    participants operating a single domain. Additionally, the domains in our sample are mainly

    registered by self-described for-profit businesses or organizations (49%), followed by the

    domains registered by individuals (33%), and domains registered by non-for-profit organizations

    (14%)10 (Figure 7). Moreover, respondents reported that most of the domains (46.5%) in our

    sample are used for commercial activities. Finally, the great majority of the participants (93%)

    indicated they are aware that any personally identifiable information included in Registrant name

    and contact data can be accessed via the public WHOIS directory.

    10 This survey asked Registrants to indicate whether a domain name was registered by an individual, for-profit business or organization, non-profit organization, informal group, or other. Unlike other WHOIS studies (NORC, 2010 and 2013), we did not attempt to verify these answers or to classify entities actually using domains for any stated purpose.

    31

    2 1 1 1 1 1 1 1 1 1 05

    101520253035

    Part

    icip

    ants

    from

    a

    sing

    le c

    ount

    ry

  • 32

    Figure 7 Self-reported use of surveyed domains.

    Comparing the self-reported demographics of our survey with the WHOIS-based findings of the

    WHOIS Registrant Identification Study (NORC, 2013), we see that the top two categories are

    occupied by similar entities in both studies, with individual /natural person Registrants appearing

    roughly with the same frequency (30% vs. 33%). In our study, the combined share of categories

    representing legal person Registrants is 62% compared to 39% in (NORC, 2013).

    Reported WHOIS misuse We now present our findings for each specific type of WHOIS misuse that we studied. In each

    set of questions, we first asked the participants to report if they have experienced misuse of

    specific type of information supplied when registering their domain. If the answer is yes, we then

    asked more specific questions about those misuse incidents.

    25 of the respondents (43.9%) reported experiencing some kind of misuse of their WHOIS information. In Table 7 we provide a breakdown of the reported WHOIS misuse for the three types of information published in WHOIS that are reportedly subject to misuse: postal and email

    address, and phone number.

    For-profitbusiness ororganization

    Individual use Non-profitorganizationInformal

    interest group

    Use of domains 48.8% 32.6% 14.0% 4.7%

    0.0%

    10.0%

    20.0%

    30.0%

    40.0%

    50.0%

    60.0%%

    of d

    omai

    ns

  • 33

    Table 7 Breakdown of participants reporting misuse, based on the type of reported misuse

    Postal address misuse 38.6% of surveyed Registrants (22) have received postal spam mailed to an address published

    in WHOIS, and 29.8% (17) believed the unsolicited mail resulted from misuse of their WHOIS

    postal address. As a proof of their suspicion, participants provided details of the unsolicited mail;

    it was either directly related to one of their domains, or it advertised web services. Moreover,

    21.1% (12) of the participants reported that their WHOIS postal address was not published in

    any other public directory (e.g. phone book, website, etc.).

    The majority of the respondents that have received postal spam (14% of total, 8) experience this

    a few times a year, with 11% (6) receiving postal spam a few times a month, and 5% (3) less

    than once a year. The reported subjects of the unsolicited correspondence were mainly related

    to fake domain name renewals and transfers, followed by messages related to website hosting,

    and search-engine optimization (SEO) services.

    Email address misuse 25 Registrants (43.9%) reported receiving spam email at an account associated with a WHOIS

    email address. 29.8% (17) of those associate the misuse of their email address to WHOIS

    because the topics of the spam emails specifically targeted domain name Registrants (e.g.

    domain name transfer offers, domain name SEO offers). 14% (8) of the Registrants stated they

    have not listed the misused email address in any other public directory.

    Phonenumber

    Emailaddress

    Postaladdress Combined

    Experienced misuse, andinformation was published in

    WHOIS only8.8% 14% 21.1% 43.9%

    Experienced misuse andattribute misuse to WHOIS 12.3% 29.8% 29.8% 43.9%

    Experienced misuse 22.8% 43.9% 38.6% 73.7%

    0%10%20%30%40%50%60%70%80%

    % o

    f res

    pons

    es

  • 34

    The majority of the respondents (10%, 6 Registrants) identifying WHOIS data misuse as a

    cause for email spam reported that they receive spam email at the email address published in

    WHOIS a few times a day, followed by 9% of responses (5 Registrants) receiving unsolicited

    email a few times a week. The topics of the unsolicited messages are similar to the ones

    reported for postal spam.

    Phone number misuse 22.8% (13) of Registrants reported receiving voicemail spam, with 12.3% (7) attributing the

    spam to WHOIS misuse. They were able to associate the voicemails with WHOIS because the

    caller either explicitly referred to a domain name under the Registrant’s   control   or   they  were  

    offering domain services. 9% (5) of the Registrants who claimed to have experienced the

    misuse of their WHOIS phone number said they had not listed their number in any other public

    directory.

    Identity theft Two of the participants reported that they have experienced identity theft but none could tie this

    to WHOIS misuse.

    Unauthorized intrusion to servers In order to measure the extent of misuse of WHOIS information to gain unauthorized access to

    servers, we first asked the participants if they are the system administrators of Internet servers

    associated with one of their registered domains. The number of participants that have this role is

    very small (7%, 4), with just one person experiencing unauthorized intrusion. That respondent

    could not tie the intrusion to WHOIS misuse.

    Blackmail One participant reported being a victim of blackmail11 as a result of their information being

    published in the WHOIS directory. The Registrant was allegedly accused by a third-party

    company of violating the terms of domain registration because of the name the Registrant chose

    for the domain. The Registrant said he was asked to pay some amount to settle, but after

    consulting with lawyers, the Registrant decided to not take any action. After a few months, and a

    series of emails from the third party, the latter stopped communicating with the Registrant. The

    11 We describe this incident as reported by the Registrant, but cannot know the veracity of this claim or whether the domain name dispute was founded.

  • 35

    Registrant reported being adversely affected in terms of time (reading emails), and money

    (lawyer consultation).

    Other Although this survey gave Registrants an opportunity to describe WHOIS misuses not otherwise

    covered, no participant claimed to have experienced any other type of WHOIS misuse.

    Adverse effects In Figure 8 we present the portion of Registrants that reported they were adversely affected by

    the misuse of their information, reportedly caused by WHOIS. In all types of misuse the main

    adverse effect is the frustration caused by the extra time the Registrants need to go through the

    spam email, postal mail, and voicemail. Spam calls associated with WHOIS misuse, even

    though they only occur a few times in a year, appear to cause the highest level of frustration

    (12%), possibly because spammers directly interact with the person picking up the phone.

    Spam postal mail causes the least frustration (5%): people are used to junk mail, and WHOIS

    associated postal spam is relatively infrequent. WHOIS-related email spam, even though it is

    the type of misuse most prevalent and frequent, adversely impacted 10.5% of the Registrants. A

    plausible explanation for this discrepancy is that people in general, and Registrants in this case,

    are used to receiving many unsolicited emails on a daily basis. Therefore the marginal cost of

    deleting one more spam email originating due to WHOIS misuse may be considered negligible

    by sampled Registrants.

    Figure 8 Portion of participants adversely affected by the misuse of their information published in the WHOIS, broken down into the three main types of misuse.

    12.3%

    10.5%

    5.3%

    0%

    2%

    4%

    6%

    8%

    10%

    12%

    14%

    Phone number Email address Postal address

    % a

    dver

    sely

    affe

    cted

    Type of misuse

  • 36

    Countermeasures 40% (8) of the 20 Registrant survey participants that have experienced at least one type of

    WHOIS misuse reported having taken actions to protect themselves from additional WHOIS

    misuse. On the other hand, 60% (12) of Registrants experiencing misuse did not take any

    countermeasures. Registrants that took action reported utilizing a combination of the following:

    Moving to a different Registrar (3). Change misused portions of WHOIS information (4). Change contact addresses and names with ones from a service provider (proxy

    services) (4). Change contact addresses with forwarding addresses provided by a service provider

    (privacy services) (3). Supply partially incorrect or incomplete information (2). Apply spam filter or register with an identity theft protection service (5).

    The last option attracted the most interest, even though it only deals with the consequences of

    the misuse, rather than trying to remedy possible factors leading to the WHOIS misuse itself.

    24.5% of participants (14) were  aware  of  strategies  used  by  their  domains’  Registrars to deter

    WHOIS misuse. Most of the responses indicated the availability of proxy and privacy services

    as part of the Registrars’  strategies  against  WHOIS  misuse; and the use of CAPTCHAs in web-

    based WHOIS queries as part of the set of strategies.

    5.4. Discussion Getting Registrants to communicate their experiences in terms of the possible misuse of their

    personally identifiable information listed in WHOIS proved to be a challenging task. Even with

    an incentive to participate (a raffle at the end of the survey), we were only able to collect

    responses from a small portion of invitees (57 out of 340, or 17%). However we were able to get

    a clear insight into the prevalence of WHOIS misuse and the specific types of information that is

    usually targeted.

    Our study showed that 43.9% of Registrants claim to have experienced some type of WHOIS misuse. Given the margin of error rate of 12.7% this observation neither confirms or disproves that WHOIS-misuse is affecting the majority of Registrants. It does confirm though the hypothesis that public access to WHOIS data leads to a measurable and statistically significant degree of misuse.

    The email address is mostly targeted, followed closely by the postal address. Phone numbers

    are also misused, but with a much smaller occurrence and higher adverse impact per incident.

  • 37

    In terms of certainty of whether the misuse is originating from WHOIS, postal address misuse

    comes first.

    Potential survey biases We need to contemplate the biases the survey design introduced to evaluate the possibility of

    over or under-reporting of WHOIS misuse. First, by not providing translated versions of the

    survey to 15% of the sample, we may have missed some incidents of misuse experienced by

    Registrants that do not speak English. However, given the observed response rate (3.6%), the

    expected response rate of that portion of the sample (15%) is less that 1%. (3.6% of 15%) In

    other words, even if we had all the possible translations, we expect that we would not get a

    statistically significant number of responses from this group.

    Another possible bias is that Registrants may be more willing to report a harmful act (e.g.

    experience with misuse) rather than a lack of harmful incidents, which could lead to over-

    representation of the incidents. In addition, we did not attempt to verify or corroborate any

    WHOIS misuse incident, which could lead to false representation of the extent of WHOIS

    misuse. However, the strong economic incentive we provided (entry into a random prize

    drawing) was given for completing the survey, regardless of the kind of responses entered, and

    should mitigate this potential source of bias.

    One may argue that as this is a survey with a fair amount of technical content, it is biased

    towards tech-savvy participants. We attempt to mitigate this possibility by providing explanatory

    links throughout the survey. Additionally, since the registration of a domain assumes some level

    of technical understanding about the Internet, we believe that the technical complexity of this

    survey should be within the technical understanding of most Registrants.

    Finally, as the described, the great majority of the survey participants originate from North

    America. This fact affects our findings in the following ways; first, we are unable to analyze the

    geographical distribution of misuse, as the survey suffers from coverage bias. Consequently,

    findings are also descriptive of a narrower portion of the world population than we had wished.

    As a result, the survey cannot accurately capture potential geographical diversity in the

    occurrence of WHOIS misuse.

  • 38

    6. Assessing Registrar/Registry anti-harvesting In this section we discuss the WHOIS anti-harvesting techniques offered by the Registrars and

    Registries. We first present the results of a survey that collected information from Registrars and

    Registries regarding their experiences in terms of WHOIS harvesting incidents and employed

    countermeasures. Then, we empirically tested the Registrars’   infrastructures when faced with

    WHOIS queries at high rates, and we present our findings here.

    6.1. Survey methodology and design This survey targeted the top five gTLD Registries and a globally diverse sample of Registrars to

    collect information related to their experiences in terms of WHOIS misuse incidents, and their

    efforts to counter such activity. We used email messages to invite a sample of Registrars and

    Registries to participate in the survey. The invitation contained a short description of the study,

    information about the principal investigator, and links to either participate in the survey or opt out

    from any future messages and reminders from us. We also offered the option to download the

    questionnaire and email the responses to us. The content of the invitation is available in

    Appendix C – Registrar and Registry Survey: Invitation to Participate

    When invitees click on the link to participate they are presented with a consent form that

    describes briefly the procedures, requirements, risks, benefits, associated compensation (none),

    and privacy assurances we offered. The text is available in Appendix C – Registrar and Registry

    Survey: Consent form. In the consent form we offered assurances in terms of the confidentiality

    of the reported results, in that no Registrar or Registry would be mentioned explicitly, and all the

    results would be presented in aggregate form.

    Before running the survey we ran a pilot with a small number of Registrars to evaluate the

    quality of the questions and the related material. Some questions and part of the consent form

    were modified to reflect pilot-reported sensitivity, particularly around disclosure of anti-

    harvesting techniques.

    The Registrar survey lasted 6 months – from March 2012 until September 2012 – and collected

    in total 22 responses out of 111 invitees. For the invitation process, we used information

    associated with the Registrant Survey sample, by identifying the Registrars and Registries that

    collected and/or store WHOIS information for those sampled domains. Since our sample is

    targeted based on the survey design, we do not make any claims of statistical significance in

  • 39

    terms of the overall gTLD Registrar and Registry population. However we do claim that we have

    collected responses from 22 out of the 107 largest Registrars and, regrettably, despite

    personalized invitations and multiple follow-up phone calls to Registry contacts in March 2013,

    only one of the 4 top 5 gTLD Registries. The survey was implemented with SurveyMonkey and

    all connections to the service were protected with SSL.12

    6.2. Analysis of responses We first describe the demographics of the Registrar/Registry survey participants in terms of their

    location, the volume of domain registrations and WHOIS queries they process monthly. We then

    provide an overall summary of our findings from this survey.

    Demographics The majority of 22 Registrars that participated in the survey were located in the United States

    (5), with the rest distributed across the following countries: China, Germany, Spain, Poland,

    Turkey, France, India, South Korea, and UK. About 64% of the Registrars handle under 1

    million domain registrations each, and 14% handle between 1 and 10 million registrations each

    (Figure 9).

    Registrars reported that the most popular method of querying their WHOIS databases is by port

    43, which 56% of the Registrars said was used for 100,000 to 10 million queries per month.

    Figure 9 Number of domains registered with Registrars participating in survey

    12 See footnote 6.

    Exactly or under 100000 100 001 to 1 000 000 1 000 001 to 10 000 000 More than 10 000 000

    Number of domain registrations 50% 14% 14% 0%

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    % o

    f Reg

    istr

    ars

  • 40

    Table 8 WHOIS queries received by Registrars participating in survey. Note that not all participants answered all questions, so that the columns do not add to 100%.

    Employed anti-harvesting techniques 57% of surveyed Registrars and Registries (13 of 23) implement at least one WHOIS anti-

    harvesting technique, and in Figure 10 we present a breakdown of the techniques implemented

    per Registrar/Registry. 39% (9 of 23) reportedly implementing port 43 rate limiting. 56.5% (13 of

    23) provide web forms for interactive WHOIS queries, and 39% (9) require an answer to a

    CAPTCHA type challenge to receive the WHOIS response. 30% of surveyed Registrars and

    Registries (7) reported that they use permanent IP/domain blacklisting when necessary, while

    52% (12) blacklist temporarily abusers of the service for 5 to 10 minutes.

    In addition to direct anti-harvesting measures designed to deter active harvesting, we also

    asked Registrars and Registries about Privacy and Proxy services that make harvesting less

    desirable. Only 22% of surveyed Registrars and Registries (5) said they offer privacy services

    that shield contact details of the domain Registrant except for the Registrants name, and 9% (2)

    said they offer proxy services that completely shield all contact details. However, Registrants

    can also use privacy and proxy services offered by third parties that are not Registrars or

    Registries. Interestingly, when looking at the Registrant survey responses for Registrants who

    chose countermeasures other than privacy and proxy services, surveyed Registrants reported

    only one instance where the Registrar did not offer a privacy/proxy service.

    Port 43 WHOIS protocol queryresponses/month

    Web form WHOIS queryresponses/month

    Bulk WHOIS data purchasetransactions/month

    Do not know or do not measure 18% 27% 27%1,000,001 to10.000.000 9% 5% 0%100 001 to1 000 000 32% 14% 9%Exactly orunder 100 000 14% 27% 32%

    0%

    5%

    10%

    15%

    20%

    25%

    30%

    35%

    % o

    f Reg

    istr

    ars

  • 41

    Figure 10 Proportion of Registrars and Registries implementing a specific WHOIS anti-harvesting technique.

    Incidents of WHOIS misuse We inquired about harmful events associated with incidents of alleged WHOIS misuse that were

    reported by any Registrant13. Table 9 shows the reported events in a descending order of

    prevalence. On the top of the list is email spam, which was reported to 39% (9 of 23) of the

    Registrars. It is followed by phishing (22%, 5), postal spam (17%, 4), email virus (9%, 2), ID

    theft (9%, 2), and various forms of blackmail (9%, 2). 26% of the Registrars and Registries (6 of 23) said they were able to verify that the reported harmful acts originated from misuse of the WHOIS information.

    Incidents of WHOIS harvesting and their effect in deploying new countermeasures 30% (7) of the surveyed Registrars and Registries have reportedly experienced attempts of

    automated harvesting of WHOIS information from their directories, but the respondents did not

    classify any as successful. The same respondents also reported that they have adopted new

    anti-harvesting techniques in the past 2 years, as a result of the observed attacks. The most

    prominent additions to their defenses are permanent and temporary IP and domain blacklisting

    13 We did not ask Registrars or Registries about specific incidents that were discussed in Registrant survey responses.

    52%

    39% 39%

    30% 22%

    9%

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    Temporaryblacklisting

    Port 43 ratelimiting

    CAPTCHAtype

    challenge

    Permanentblacklisting

    Privateregistration

    services

    Registrationvia proxy

    % o

    f Reg

    iatr

    ars\

    Regi

    atri

    es

    Type of anti-harvesting technique implemented at Registrar/Registry

  • 42

    along with port 43 rate-limiting (4), privacy/proxy protections services (3), and CAPTCHA (2).

    Respondents were not asked to evaluate the perceived effectiveness of these measures.

    Many participants did not provide responses in this section. That can be attributed to the

    sensitive nature of the information we requested. Even though we provided assurances for the

    safe handling and aggregation/anonymization of any information collected by this survey,

    Registrars and Registries appear to be hesitant about providing WHOIS misuse specifics.

    Table 9 Registrars receiving reports related to suspected types of WHOIS misuse

    6.3. Testing of WHOIS query rate limiting techniques We complement our survey with an experimental validation of methods employed by Registrars

    and Registries to combat WHOIS misuse. More precisely, we performed two types of tests on a

    sample of Registrars and Registries to evaluate the availability and effectiveness of WHOIS

    harvesting countermeasures. First, we performed rate-limiting tests on port 43 of Registrars and

    registries, the well-known network port used for the reception of WHOIS queries. Additionally we

    carried out rate-limiting tests for interactive WHOIS query web forms provided by Registries.

    Table 10 presents our findings related to the 3 thick Registries that are within the focus of this

    study. Based on our test results, we observed that one Registry provides none of the tested

    anti-harvesting mechanisms whatsoever; however the other two Registries employ a

    combination of anti-harvesting techniques. For instance one Registry employs relatively strict

    measures by enforcing the use of CAPTCHA, and it allows a very small number of queries to be

    issued to port 43 before applying a temporary blacklist.

    Email spam Phishing Postal spam Email virusAbuse of

    personal dataor identity

    theft

    Blackmail/ransom

    demands/intimidation

    Denial ofService

    Vishing(voicemailphishing)

    Unauthorizedintrusion on

    servers

    Registrantshave reportedexperiencingharmful acts,

    but I prefer notto divulgespecifics

    Registrars 39% 22% 17% 9% 9% 9% 4% 4% 4% 4%

    0%

    5%

    10%

    15%

    20%