Talha Obaid, Email Security, Symantec at MLconf ATL 2017
-
Upload
mlconf -
Category
Technology
-
view
492 -
download
1
Transcript of Talha Obaid, Email Security, Symantec at MLconf ATL 2017
Machine Learning for Detecting Malware
Talha Obaid Ling Zhou Timothy You Xinlei Cai
MLConf – Atlanta Sep 2017
Email Security
Scripting
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
The Team!
Ling ZhouTimothy You
Xinlei Cai
Talha Obaid
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Machine Learning @ Symantec
• Early adopter of ML in industry• SRL – Symantec Research Labs• CAML – Centre for Advanced Machine Learning • Malware detection, spam identification • Helped achieve the compounded impact• Malware polymorphism
https://www.symantec.com/connect/blogs/meet-symantec-labs-industrys-best-kept-secret
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Reference:https://www.symantec.com/connect/blogs/machine-learning-not-only-answer
How I got infected?
Email – as a carrier!
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Email is the weapon of choice!
• One in 131 emails contained malicious link or attachment, the highest rate in five years
• The rate jumped from 1 in 220 emails in 2015 to 1 in 131 emails in 2016
• In 2016 Small to Medium sized Businesses were the most impacted by phishing attacks with 1 in 95 emails containing malware
• Email sent daily in 2016 – 269 billion*
• The general office worker receives an average of 600 emails per week*
• Blended attacks - Email as a career for malicious URL
• Office document files are an effective weapon
• Lighter footprint and hiding in plain sight
Reference:
https://www.symantec.com/security-center/threat-report
* Email Statistics Report, 2017-2021, Radicati Group, February 2017 Copyright © Symantec
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Worldwide Email Forecast
Worldwide Email Users* (M)
3,718 3,823 3,930 4,037 4,147
% Growth 3% 3% 3% 3%
Reference: https://www.radicati.com/wp/wp-content/uploads/2017/01/Email-Statistics-Report-2017-2021-Executive-Summary.pdf
* Includes both Business and Consumer Email users
Daily Email Traffic 2017 2018 2019 2020 2021
Total Worldwide Emails Sent/Received Per Day (B)
269.0 281.1 293.6 306.4 319.6
% Growth 4.5% 4.4% 4.4% 4.3%
Worldwide Daily Email Traffic (B), 2017-2021
Worldwide Email User Forecast (M), 2017–2021
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Email: Locky malware delivery vector
Reference:
https://www.symantec.com/security-center/threat-report
http://www.latimes.com/business/technology/la-me-ln-hollywood-hospital-bitcoin-20160217-story.html
https://arstechnica.com/information-technology/2016/02/locky-crypto-ransomware-rides-in-on-malicious-word-document-macro/
Copyright © Symantec
• Released in 2016• Still active in 2017• “Enable macro if data encoding is incorrect”• If the user does enable macros, the macros then save and run a
binary file that downloads the actual encryption Trojan• Hospital in Hollywood payed $17,000 in bitcoin to hackers
Scripting Malware – real ones!
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Exampli Gratia
AutoClose, Random variable, String split
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Fake variableFake commentFake condition
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Multiple FunctionString split
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
String encryption Random variable Function Call hidden
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
String EncryptionRandom variableMulti functionClick event
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
String hiddenFake condition
Machine Learning forhand-written text!
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Domain Differences
Programming Language
• Non-Ambiguous
• Deterministic language
• Clear distinction between syntax and semantics
• Semicolons, Tabs vs Spaces, Editor wars
• Identifier, sub routine calls, imports
• Comments, conventions, notations
• Design patterns
Natural Language
• Ambiguous
• Context-bound languages
• Less distinguished between syntax and semantic
• Puns, Rants, Parodies, Imitations
• TF-IDF
• LSTM – Long short term memory
• Bag of words
Copyright © Symantec
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Machine Learning Applications – Code!
Automatic Patch Generation by Learning Correct Code by Fan et. al.
Reference:
https://www.newscientist.com/article/mg23331144-500-ai-learns-to-write-its-own-code-by-stealing-from-other-programs/
http://people.csail.mit.edu/rinard/paper/popl16.pdf
Copyright © Symantec
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Onlyhttps://www.forbes.com/sites/adrianbridgwater/2016/03/07/machine-learning-needs-a-human-in-the-loop
https://blogs.technet.microsoft.com/machinelearning/2016/10/17/the-power-of-human-in-the-loop-combine-human-intelligence-with-machine-learning/
Human-In-The-Loop?
How we do it
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Rule ^ ML
Analyze
Inflation
Macro Extraction
Parsing
Feature Extraction
Copyright © Symantec
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Feature Selection (Total 72 Features)ML_1... ML_12…
ML_2... ML_13…
ML_3... ML_14…*
ML_4... ML_15…
ML_5... ML_16…
ML_6... ML_17…
ML_7… ML_18…
ML_8… ML_19…
ML_9… ML_20…
ML_10… ML_21…*
ML_11… …
Note: Features with (*) can be expanded to the count of each item.
ML_21_1… ML_14_1…
ML_21_2… ML_14_1…
ML_21_3… ML_14_1…
ML_21_4… ML_14_1…
ML_21_5… ML_14_1…
ML_21_1… ML_14_1…
ML_21_1… ML_14_1…
ML_21_1… ML_14_1…
ML_21_1… ML_14_1…
ML_21_1… ML_14_1…
… 29 features … 21 features
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Optimization
ML_1…(Composite)
ML_2… ML_3… ML_4… ML_14_3…
1 31469 1245 35 211 0
2 44617 1264 14 171 0
3 33247 1045 14 158 0
… … … … … …
1234 18828 682 29 222 1
… … … … … …
40000 1273048 844 19 151 0
• Treat ML_1… feature since it is dependent on other features.
• Treat features like ML_14_3… since categorical feature.
Results – very recent ones!
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Spam run – from Aug 21 to Aug 27
{ "desc": "Shell call", "artifact": " Shell \"Explorer.exe \" & strCommande, vbNormalFocus, "
},
Copyright © Symantec
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Just this morning … 15 Sep 2017
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Recently captured…
{
"desc": "Small routine with string manipulation",
"artifact": " Chinook = (AscB(Sumatran_Rhinoceros))"
"artifact": " Tapir = Chinook(Mid(Sand_Lizard, Chipmunk, 1)) - Int(M..."
},
{
"desc": "Small routine with run & Obfuscated object concat & Obfuscated object creationarguments shell & Createobject run one-liner",
"artifact": " CreateObject(Pig + \"Shell\").Run Module1.Ibis(Sea_Dragon, \""
},
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
{
"desc": "Obfuscated object variable",
"artifact": "Set miLxhuTjOMrpjvLQQNhstoiWlCkOdozYkasyizjweDRGlKRkgtkgxHZyAoLfJFFaMSFJDNiRekNpWbkbkzhjETbcAtytnDmZxruTFIhTLSCM = CreateObject(ujcYEkvJXWWtqcIKOpdaxorehRVbSNYlQPiQQao"
},
{
"desc": "Obfuscated object creation arguments",
"artifact": "Set qvBvooYSTaFymchvnZIkLUSrhheHIwfYCSyrpgvjePoCKWbhMYoOBOJVcKO = CreateObject(kbUBGIKqbHJyTmAmPbuHSBjqouVxfwCfSfEWfcNXxXYAhCJKXcegnoejsdNMnNKeFdfnieGnOXJvcjJlkKZDSV"
},
{
"desc": "Long obfuscated variable assignment",
"artifact": "ZGwEiLSTkOsQSFcFzZVPMMuHalgKESzgWlohddzbmveToRIxzt"
},
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
{
"desc": "Macro with constant manipulation in function call",
"artifact": "dNDfJESUPztgDlcNnWNZLIPsGgXDVndgUDYaarDOIWeCVstlSACjSVcUyLZ = CWvXJUNlxQcbDqNtnmQhCsifqGFBSHE$(327 - 240) & CWvXJUNlxQcbDqNtnmQhCsifqGFBSHE$(324 - 241) & CWvXJUNl…"
},
{
"desc": "Highly random long string found",
"artifact": "mRClEXzmRGxUqDPLJHcHeEMgjtqozQbuXXYIpdNJOtykVB"
},
{
"desc": "Object creation variable identifier",
"artifact": "qvBvooYSTaFymchvnZIkLUSrhheHIwfYCSyrpgvjePoCKWbhMYoOBOJVcKO"
},
{
"desc": "Random subroutine name",
"artifact": "dnHLjlClNBEYNnZihnFPOighaDbyTOUim"
},
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
{
"desc": "Random identifier with suspicious assignments",
"artifact": "ujcYEkvJXWWtqcIKOpdaxorehRVbSNYlQPiQQaoCIdBbVAdczWFVpbOGsxrmOTqKykcaurtoAaRUmQJgntcvICwoBcYTiBopmrckXChHdQUOKtTcnKzV = Chr$(327 - 240) & Chr$(324 - 241) & Chr$(24…"
},
{
"desc": "Shell/SaveToFile string contains strange variable name",
"artifact": "RhIzeRHLbzssvNwesaErYKfXuynMPZjWdUBgPAZZUnlhknaNjNAQERoHClFgeuvBPWPbMQPsAeXlYymHXZdCZTRMfteev"
},
{
"desc": "File with following name was created and run created",
"artifact": "XABNAGkAYwByAG8AcwBvAGYAdAA=XABxAGIASwBWAEsAdgBsAGgAdwBpAEoAUgBLAC4AZQB4AGUA"
},
And… we capture a lot more!
Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only
Findings & Going Forward …
• “If an artifact is missing” means a sample is missed – not anymore
• All features contribute to the verdict in unison
• Obfuscation is still a challenge and will remain to be one
• Identify why a variable of string type is assigned a byte array?
• Why an assignment expression is more than say 200 characters?
• Keep transitioning inflating malware samples from sandbox to static analysis
Thank You!
Talha Obaid Ling Zhou Timothy You Xinlei Cai
Email Security
Join us! www.symantec.com/about/careers