ShamFinder: An Automated Framework for Detecting IDN ...

42
ShamFinder: An Automated Framework for Detecting IDN Homographs ACM IMC 2019 H. Suzuki 1 , D. Chiba 2 , Y. Yoneya 3 , T. Mori 1,4 , and S. Goto 1 1 Waseda University, 2 NTT Secure Platform Laboratories 3 JPRS, 4 NICT 1

Transcript of ShamFinder: An Automated Framework for Detecting IDN ...

Page 1: ShamFinder: An Automated Framework for Detecting IDN ...

ShamFinder: An Automated Framework for Detecting IDN Homographs

ACM IMC 2019

H. Suzuki1, D. Chiba2, Y. Yoneya3, T. Mori1,4, and S. Goto1

1 Waseda University, 2 NTT Secure Platform Laboratories3 JPRS, 4 NICT

1

Page 2: ShamFinder: An Automated Framework for Detecting IDN ...

Outline

• Background• ShamFinder framework• Measurement Study• Evaluation of human perception• Discussion / Future Research

2

Page 3: ShamFinder: An Automated Framework for Detecting IDN ...

Outline

• Background• ShamFinder framework• Measurement Study• Evaluation of human perception• Discussion / Future Research

3

Page 4: ShamFinder: An Automated Framework for Detecting IDN ...

Background

• IDN (Internationalized Domain Name)– Domain name using Unicode Characters.– Punycode is used for resolving IDN

4

IDN

こんにちは.com

Browser DNS server

Punycodexn--28j2a3ar1p.com

Page 5: ShamFinder: An Automated Framework for Detecting IDN ...

Background

• IDN Homograph attack– Exploits the similarity of characters to deceive users.– Ex) apple.com vs. applẹ.com

– Visually similar characters is called homoglyph.

5

Page 6: ShamFinder: An Automated Framework for Detecting IDN ...

Background• IDN homograph attack has become viable

6

binance.com targeted (bịnạnce.com)

Page 7: ShamFinder: An Automated Framework for Detecting IDN ...

Outline

• Background• ShamFinder framework• Measurement Study• Evaluation of human perception• Discussion / Future Research

7

Page 8: ShamFinder: An Automated Framework for Detecting IDN ...

Possible Solution and Challenges

• Straightforward countermeasure à identify possible IDN homographs

• Challenges– There are so many Unicode characters– There are so many IDN registered

à How can we detect IDN homograph efficiently?

8

Page 9: ShamFinder: An Automated Framework for Detecting IDN ...

Our Approach – ShamFinder framework

9

1 google.com2 youtube.com3 facebook.com4 baidu.com

:

Alexa rankingReference domain

names

All domain names

ExtractedIDNs

Homoglyph DB

IDN homographs

Step 1 Step 2

Step 3

Page 10: ShamFinder: An Automated Framework for Detecting IDN ...

Step 3: Detecting homograph domain

10

Homoglyph DB

g o o g l e

g օ օ g l e

o օ

reference

IDN

,U+0585U+006F

Latin ArmenianUC SimChar

Page 11: ShamFinder: An Automated Framework for Detecting IDN ...

Homoglyph DB• UC (existing DB)

Unicode Confusables.txt (Unicode Technical Standard #39)

• SimChar (our DB)

11

Unicode 12.0.0charsets

UCSimChar

Page 12: ShamFinder: An Automated Framework for Detecting IDN ...

SimChar building setups

12

• Target Character set• IDNA2008 and Unicode 12.0.0 draft• draft-faltstrom-unicode12-00

• Visual Appearance• GNU Unifont 12.0

• Computing Similarity• PSNR (Peak signal-to-noise ratio)• The number of different pixels between two images.

Page 13: ShamFinder: An Automated Framework for Detecting IDN ...

Computing the similarity

• PSNR: A metric that quantifies image quality degradation

13

# of different pixels

Page 14: ShamFinder: An Automated Framework for Detecting IDN ...

Procedure of Building SimChar

1. Create visual images.

2. Compute Δ for all pairs of characters and extract the ones with delta is 4 or less.

3. Eliminate sparse characters that contain small number of black pixels (fewer than 10)

15

Page 15: ShamFinder: An Automated Framework for Detecting IDN ...

Detected Characters

• Extracted characters that had small distance from “e”

16

Δ = 0 Δ = 1 Δ = 2 Δ = 3

Δ = 4 Δ = 5 Δ = 6

Page 16: ShamFinder: An Automated Framework for Detecting IDN ...

Statistics of SimChar

17

Set #characters #Homoglph pairsUC 9,605 6,296SimChar 12,686 13,208SimChar ∩ UC 233 127

Page 17: ShamFinder: An Automated Framework for Detecting IDN ...

Outline

• Background• ShamFinder framework• Measurement Study• Evaluation of human perception• Discussion / Future Research

18

Page 18: ShamFinder: An Automated Framework for Detecting IDN ...

Measurement Study

• IDN –955,512 IDNs are collected for .com TLD.

• Reference domain names–Alexa top 10k (.com)

19

Page 19: ShamFinder: An Automated Framework for Detecting IDN ...

Number of detected IDN homographs

20

Homoglyph DB Number of IDN homographs

UC 436

SimChar 3,110

UC ∪ SimChar 3,280

Page 20: ShamFinder: An Automated Framework for Detecting IDN ...

Number of blacklisted IDN homographs

21

Homoglyph DB hpHosts GSB Symantec

UC 28 2 1SimChar 222 12 7UC ∪ SimChar 242 13 8

Page 21: ShamFinder: An Automated Framework for Detecting IDN ...

Analysis of active IDN homographs

22

Page 22: ShamFinder: An Automated Framework for Detecting IDN ...

Analysis of active IDN homographs

23

Category Numbers

Domain parking 348For sale 345

Redirect 338

Normal 281Empty 222

Error 113

Total 1,647

Category Numbers

Brand protection 178

Legitimate website 125

Malicious website 35

For business (42%)

21%

Page 23: ShamFinder: An Automated Framework for Detecting IDN ...

Changing speaker

24

Page 24: ShamFinder: An Automated Framework for Detecting IDN ...

Outline

• Background• ShamFinder framework• Measurement Study• Evaluation of human perception• Discussion / Future Research

25

Page 25: ShamFinder: An Automated Framework for Detecting IDN ...

Research Question:

Are the newly detected homoglyphs (SimChar) confusable?

Page 26: ShamFinder: An Automated Framework for Detecting IDN ...

Studying Human Perception

• User study experiments using MTurk

• Test the confusability of homoglyph sets, SimChar and UC.– Sampled 20 pairs from each set.

• Random sets for detecting the outliers

27

Page 27: ShamFinder: An Automated Framework for Detecting IDN ...

Experimental Setup

• Pilot Study (w/ small participants)– Adjusted the questionnaires– Measured time to complete a task • Minimum wage in US: 7 – 12 USD / hr• 15 seconds / task à 0.05 USD / task

• Recruiting Participants (Turkers)– #approved tasks > 50 – approval rate 97%

28

Page 28: ShamFinder: An Automated Framework for Detecting IDN ...

29

Page 29: ShamFinder: An Automated Framework for Detecting IDN ...

Experimental Setup Con’t

• Experiment 1: – Studied how the threshold Δ affects human perception

• Experiment 2:– Compared the confusability of UC and SimChar

30

Page 30: ShamFinder: An Automated Framework for Detecting IDN ...

Experiment 1

31

Confusing

Very confusing

Neutral

Distinct

Very distinct

Threshold ∆

∆=4 ∆=5

Page 31: ShamFinder: An Automated Framework for Detecting IDN ...

Experiment 2

Homoglyphs contained in SimChar are more confusable than those contained in UC

Confusing

Very confusing

Neutral

Distinct

Very distinctRandom SimChar UC

Page 32: ShamFinder: An Automated Framework for Detecting IDN ...

Outline

• Background• ShamFinder framework• Measurement Study• Evaluation of human perception• Discussion / Future Research

33

Page 33: ShamFinder: An Automated Framework for Detecting IDN ...

Countermeasures using SimChar

http://g໐໐gle.com

Modern browsers

Proposed interface

Page 34: ShamFinder: An Automated Framework for Detecting IDN ...

Limitations

• Evaluation used GNU Unifont only à Need to extend the evaluation for other font families

• Participants of Human Study were English speakers à Need to consider the linguistic/cultural background

め vs. ぬ

Page 35: ShamFinder: An Automated Framework for Detecting IDN ...

Contributions to the standardization communities

• W3C AHA (Anti-Homograph-Attack) CG Chartered– https://github.com/yoneyajp/AHA/wiki

• ICANN IDN Workshop (next month)

• IETF (TBD)• Unicode Consortium (TBD)

36

Page 36: ShamFinder: An Automated Framework for Detecting IDN ...

Summary / Future research direction

• ShamFinder – a framework to detect IDN homographs efficiently

• SimChar – Systematically updatable homoglyph DBhttps://github.com/shamfinder/shamfinder

• Possible security applications– Attacks that exploit an intrinsic gap between human perception and

machine processing– Ex) Viagra à V1@gra

37

Page 37: ShamFinder: An Automated Framework for Detecting IDN ...

Questions?

38

Page 38: ShamFinder: An Automated Framework for Detecting IDN ...

Time taken for constructing SimChar.

39

Page 39: ShamFinder: An Automated Framework for Detecting IDN ...

Top languages used for IDNs

40

Page 40: ShamFinder: An Automated Framework for Detecting IDN ...

Top-5 ASCII domain names that have the most IDN homographs.

41

Page 41: ShamFinder: An Automated Framework for Detecting IDN ...

42

Δ = 0 Δ = 1 Δ = 2 Δ = 3 Δ = 4

Δ = 5 Δ = 6

confusable

distinguishable

Page 42: ShamFinder: An Automated Framework for Detecting IDN ...

Not that confusable pairs in UC

43