Yinglian XieYinglian Xie Fang Yu, Kannan Achan,...
Transcript of Yinglian XieYinglian Xie Fang Yu, Kannan Achan,...
![Page 1: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/1.jpg)
Yinglian XieYinglian Xie
Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan OsipkovGeoff Hulten, Ivan OsipkovMicrosoft Research Silicon ValleyWindows Live (Hotmail) mail, Windows Live Safety Platform
MSR-SVC
![Page 2: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/2.jpg)
Botnet attacks areBotnet attacks are prevalent◦ DDoS spam click fraudDDoS, spam, click fraud◦ Large-scale, hidden trailCan we detect them by signatures?◦ Filter future spam emails
Detect botnet membership◦ Detect botnet membership◦ Understand botnet
activities and trends
Focus on URLs instead of contents74% of spam contain at least one URL74% of spam contain at least one URL
MSR-SVC
![Page 3: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/3.jpg)
Common or similar URLs from a botnet spamCommon or similar URLs from a botnet spam campaign◦ Direct users to one or a few Websites◦ Direct users to one or a few WebsitesBotnet attacks are distributed yet bursty◦ Distributed: Involve a large number of hostsDistributed: Involve a large number of hosts◦ Bursty: spam sent in a short duration
Identify groups of distributed hosts that sent Identify groups of distributed hosts that sent similar URLs in a burstsimilar URLs in a burstsimilar URLs in a burstsimilar URLs in a burst
MSR-SVC
![Page 4: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/4.jpg)
Mixing spam and legitimate URLs in an emailMixing spam and legitimate URLs in an email◦ Which URLs characterize a Botnet?◦ White list approach does not workWhite list approach does not work
Email 1Email 1 Email 2Email 2 Email 3Email 3
http://www.shopping.com
http://www.w3.org/wai
http://www psc edu/networ
http://www.peacenvironment.net
http://www.w3.org/wai
http://endosmosis.com/
http://www.talkway.com
http://www.google.com/url?sa=U&start=4&q=http://isc.sans.org
http://isc.sans.orghttp://www.psc.edu/networking/projects/tcp/
… …
http://www dvdfever co uk/
http://www.bizrate.com
… …
http://www dvdfever co uk/
http://www.bizrate.com
… …
http://www.dvdfever.co.uk/
http://isc.sans.org
http://www.dvdfever.co.uk/co1118.shtml
… …
http://www.dvdfever.co.uk/co1118.shtml
… …
co1118.shtml
… …
MSR-SVC
![Page 5: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/5.jpg)
Crafting polymorphic URLs across emailsCrafting polymorphic URLs across emails◦ Insert random tails◦ Customize based on recipientCustomize based on recipient◦ What are the common URL patterns?
Time URLsSource
ASesURLs
h // l / /? h b l
2006‐11‐02 66 38
http://www.lympos.com/n/?167&carthagebolets
http://www.lympos.com/n/?167&brokenacclaim
http://www.lympos.com/n/?167&acceptoraudience
2006‐11‐15 72 39
http://shgeep.info/tota/indexx.html?jxvsgfhjb.cvqxjby,hvx
http://shgeep.info/tota/indexx.html?ijkkjija.cvqxjby,hvx
http://shgeep.info/tota/indexx.html?ibdivvx ceh.cvqxjby,hvxttp://s geep. o/tota/ de . t ? bd _ce .c q jby,
MSR-SVC
![Page 6: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/6.jpg)
Automatic Botnet spam signature generationAutomatic Botnet spam signature generationSpam URL signatures
AutoREBotnet host IP addresses
Sampled emails
How we address challenges:◦ Mixing legitimate and spam URLS
Require no labeled data or white list◦ Mixing legitimate and spam URLS
◦ Porlymorphic URLs“distributed” and “bursty” spam activity
Porlymorphic URLs Two types of signatures:
• complete URLs
FP rate 10x - 30x lower than keywordsFN rate10x lower than complete URLs
p• regular expressions
MSR-SVC
![Page 7: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/7.jpg)
•Three months of sampled emailsSampled
EmailsSource
•Three months of sampled emails•5,382,460 email messages•Labeled with spam or non spam
Ignore label
AutoRE
Ignore label
Botnets and their IPs
URL signatures
•16% ‐18% of spam not captured by well known blacklists•Identified 7,721 botnet
ig
• 0.2% false positive ratespam campaigns •Span 5,916 ASes•340,050 botnet IP addresses•0 5% false positive rate
Nov-06 Jun-07 Jul-07•0.5% false positive rate
Num. of Botnets 1,748 2,426 3,547
Num. of Spam 145,510 234,685 200,271
Num. of IPs 100,293 131,234 113,294
MSR-SVC
![Page 8: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/8.jpg)
Example Spam URL SignaturesExample Spam URL Signatureshttp://deaseda.info/ego/zoom.html?QjQRPxbZf.cVQXjbY,hVX
p p gp p g
http://deaseda.info/ego/zoom.html?giAfS.cVQXjbY,hVXhttp://deaseda.info/ego/zoom.html?QNVRcjgVNSbgfSR.XRW,hVXhttp://deaseda.info/ego/zoom.html?afRZQ.XRW,hVX
// / /http://deaseda.info/ego/zoom.html?YcGGA.XRW,hVX
http://deaseda.info/ego/zoom.html?*{4,18}.[a‐zA‐Z]{3,7},hvx
http://arfasel.info/hums/jasmine.html? UbSjWcjHC.cVQXjbY,hVXhttp://apowefe.info/hums/jasmine.html? PSeYVNfS.cVQXjbY,hVXhttp://carvalert.info/hums/jasmine.html? SfLWVYgRIBH.XRW,hVX
http://www.mezir.com/n/?167& acetyleneassign
http://[^\/]*/hums/jasmine.html?*{4,18}.[a‐zA‐Z]{3,7},hvx
http://www.mezir.com/n/?167& acetyleneassignhttp://www.aferol.com/n/?167& commonwealcircehttp://www.bedremf.com/n/?167& deprivationalignhttp://www.mokver.www/n/?167& bleachbeneficialp // / /
http://[^\/]*/n/?167&[a‐zA‐Z]{9,27}
MSR-SVC
![Page 9: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/9.jpg)
Reduce the number of signatures by 4 to 19 timeshttp://www.bedremf.com/n/?167&[a‐zA‐Z]{10,19}Reduce the number of signatures by 4 to 19 times◦ Reduce pattern matching complexity◦ Great for capture future spam
p // / / [ ]{ , }http://www.mokver.www/n/?167&[a‐zA‐Z]{11,23}
http://[^\/]*/n/?167&[a‐zA‐Z]{9,27}
45,000
50,000
ed
Before generalizationAft li ti
700
800
sion
s Before generalizationAfter generalization
25 000
30,000
35,000
40,000
45,000
ails
cap
ture After generalization
400
500
600
gula
r exp
res
10,000
15,000
20,000
25,000
f spa
m e
ma
100
200
300
mbe
r of r
eg
0
5,000
Nov 2006 June 2007 July 2007 June toJuly
# of
0
Nov-06 Jun-07 Jul-07
Nu
y
MSR-SVC
![Page 10: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/10.jpg)
Botnet IP addresses are less than 0 5% of all IPsBotnet IP addresses are less than 0.5% of all IPsTheir spam emails:
4%-6% of all detected spam4% 6% of all detected spam
AS description AS number Number of bot IPs
Korea Telecom 4766 15757
Verizon Internet Service 19262 11426
France Telecom 3215 11303France Telecom 3215 11303
China 169-backbone 4837 9960
Chinanet-backbone 4134 8113
MSR-SVC
![Page 11: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/11.jpg)
Email content similarityEmail content similarity◦ Destination Web pages
Over 90% at least 75% similar◦ Features: # of URLs, # of domains, email length
Over 95% with negligible std◦ Email text◦ Email text
Over 60% less than 50% similar
S di b h i i il itSending behavior similarity◦ Over 90% with sending time std <20 hours◦ Over 50% with sending time std < 1 8 hourOver 50% with sending time std < 1.8 hour◦ Clustering botnet sending patterns
MSR-SVC
![Page 12: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/12.jpg)
85% of spam campaigns are well clustered85% of spam campaigns are well clustered
191 hosts with 9 outliers162 hosts with 4 outliers
MSR-SVC
![Page 13: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/13.jpg)
Botnet sending activities are clusterableBotnet sending activities are clusterable
-1
MSR-SVC
![Page 14: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/14.jpg)
Higher correlation in Aug than in NovHigher correlation in Aug than in Nov
MSR-SVC
![Page 15: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/15.jpg)
Botnets are more popular for sending spamBotnets are more popular for sending spam◦ Number of spam campaigns doubled◦ Number of botnet IPs increased by 10%Number of botnet IPs increased by 10%Spam campaigns are more stealthy◦ Adoption of polymorphic URLs increased by 50%Adoption of polymorphic URLs increased by 50%Email sending patterns are clusterable when viewed in aggregateviewed in aggregateBotnet activities show correlation with the network telescope datanetwork telescope data
MSR-SVC
![Page 16: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/16.jpg)
Server zombieServer
CaptchaCaptchasolver
RDSXXTD3RDSXXTD3
MSR-SVC
![Page 17: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/17.jpg)
Large volume of input dataLarge volume of input data◦ Efficient algorithm◦ Distributed computing infrastructure◦ Distributed computing infrastructureAttackers are stealthy and evolving◦ Individual detection difficult◦ Individual detection difficult◦ An aggregated view for detection◦ Robust features or attack invariantRobust features or attack invariantRigorous evaluation challenging ◦ Usually no ground truth data◦ Usually no ground truth data◦ Collaborative monitoring
MSR-SVC
![Page 18: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/18.jpg)
Economic underpinnings of botnet attacksEconomic underpinnings of botnet attacksInformation sharing for collaborative researchresearch Attack forensics and deterrence
MSR-SVC
![Page 19: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/19.jpg)
MSR-SVC
![Page 20: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/20.jpg)
30
35AURL 1
URL 2
20
25
Signal 1
BURL 3
URL 4 URL 2
URL 4 Extract
10
15Signal 1
Signal 2
Signal 3
CURL 5
URL 6
URL 4URL 9
Signatures
0
5g
DURL 7
URL 8A E
1 4 7 10 13 16 19 22 25 28EURL 9
URL 10
MSR-SVC
![Page 21: Yinglian XieYinglian Xie Fang Yu, Kannan Achan, …web.eecs.umich.edu/~zmao/workshop/talks/Xie.pdfYinglian XieYinglian Xie Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten Ivan](https://reader034.fdocuments.in/reader034/viewer/2022042201/5ea1ca7d1b440d5a9f592bea/html5/thumbnails/21.jpg)
Why regular expressions?Why regular expressions? ◦ Need specific signatures
YSignatureExtract Signature accepted?
Extract frequent keywordsInput URLs URL RegExes
Construct D t ili
signature tree
RegEx generation
Detailing Generalization
g g
MSR-SVC