Talha Obaid, Email Security, Symantec at MLconf ATL 2017

32
Machine Learning for Detecting Malware Talha Obaid Ling Zhou Timothy You Xinlei Cai MLConf – Atlanta Sep 2017 Email Security Scripting

Transcript of Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Page 1: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Machine Learning for Detecting Malware

Talha Obaid Ling Zhou Timothy You Xinlei Cai

MLConf – Atlanta Sep 2017

Email Security

Scripting

Page 2: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

The Team!

Ling ZhouTimothy You

Xinlei Cai

Talha Obaid

Page 3: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Machine Learning @ Symantec

• Early adopter of ML in industry• SRL – Symantec Research Labs• CAML – Centre for Advanced Machine Learning • Malware detection, spam identification • Helped achieve the compounded impact• Malware polymorphism

https://www.symantec.com/connect/blogs/meet-symantec-labs-industrys-best-kept-secret

Page 4: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Reference:https://www.symantec.com/connect/blogs/machine-learning-not-only-answer

How I got infected?

Page 5: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Email – as a carrier!

Page 6: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Email is the weapon of choice!

• One in 131 emails contained malicious link or attachment, the highest rate in five years

• The rate jumped from 1 in 220 emails in 2015 to 1 in 131 emails in 2016

• In 2016 Small to Medium sized Businesses were the most impacted by phishing attacks with 1 in 95 emails containing malware

• Email sent daily in 2016 – 269 billion*

• The general office worker receives an average of 600 emails per week*

• Blended attacks - Email as a career for malicious URL

• Office document files are an effective weapon

• Lighter footprint and hiding in plain sight

Reference:

https://www.symantec.com/security-center/threat-report

* Email Statistics Report, 2017-2021, Radicati Group, February 2017 Copyright © Symantec

Page 7: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Worldwide Email Forecast

Worldwide Email Users* (M)

3,718 3,823 3,930 4,037 4,147

% Growth 3% 3% 3% 3%

Reference: https://www.radicati.com/wp/wp-content/uploads/2017/01/Email-Statistics-Report-2017-2021-Executive-Summary.pdf

* Includes both Business and Consumer Email users

Daily Email Traffic 2017 2018 2019 2020 2021

Total Worldwide Emails Sent/Received Per Day (B)

269.0 281.1 293.6 306.4 319.6

% Growth 4.5% 4.4% 4.4% 4.3%

Worldwide Daily Email Traffic (B), 2017-2021

Worldwide Email User Forecast (M), 2017–2021

Page 8: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Email: Locky malware delivery vector

Reference:

https://www.symantec.com/security-center/threat-report

http://www.latimes.com/business/technology/la-me-ln-hollywood-hospital-bitcoin-20160217-story.html

https://arstechnica.com/information-technology/2016/02/locky-crypto-ransomware-rides-in-on-malicious-word-document-macro/

Copyright © Symantec

• Released in 2016• Still active in 2017• “Enable macro if data encoding is incorrect”• If the user does enable macros, the macros then save and run a

binary file that downloads the actual encryption Trojan• Hospital in Hollywood payed $17,000 in bitcoin to hackers

Page 9: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Scripting Malware – real ones!

Page 10: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Exampli Gratia

AutoClose, Random variable, String split

Page 11: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Fake variableFake commentFake condition

Page 12: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Multiple FunctionString split

Page 13: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

String encryption Random variable Function Call hidden

Page 14: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

String EncryptionRandom variableMulti functionClick event

Page 15: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

String hiddenFake condition

Page 16: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Machine Learning forhand-written text!

Page 17: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Domain Differences

Programming Language

• Non-Ambiguous

• Deterministic language

• Clear distinction between syntax and semantics

• Semicolons, Tabs vs Spaces, Editor wars

• Identifier, sub routine calls, imports

• Comments, conventions, notations

• Design patterns

Natural Language

• Ambiguous

• Context-bound languages

• Less distinguished between syntax and semantic

• Puns, Rants, Parodies, Imitations

• TF-IDF

• LSTM – Long short term memory

• Bag of words

Copyright © Symantec

Page 18: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Machine Learning Applications – Code!

Automatic Patch Generation by Learning Correct Code by Fan et. al.

Reference:

https://www.newscientist.com/article/mg23331144-500-ai-learns-to-write-its-own-code-by-stealing-from-other-programs/

http://people.csail.mit.edu/rinard/paper/popl16.pdf

Copyright © Symantec

Page 19: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Onlyhttps://www.forbes.com/sites/adrianbridgwater/2016/03/07/machine-learning-needs-a-human-in-the-loop

https://blogs.technet.microsoft.com/machinelearning/2016/10/17/the-power-of-human-in-the-loop-combine-human-intelligence-with-machine-learning/

Human-In-The-Loop?

Page 20: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

How we do it

Page 21: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Rule ^ ML

Email

Analyze

Inflation

Macro Extraction

Parsing

Feature Extraction

Copyright © Symantec

Page 22: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Feature Selection (Total 72 Features)ML_1... ML_12…

ML_2... ML_13…

ML_3... ML_14…*

ML_4... ML_15…

ML_5... ML_16…

ML_6... ML_17…

ML_7… ML_18…

ML_8… ML_19…

ML_9… ML_20…

ML_10… ML_21…*

ML_11… …

Note: Features with (*) can be expanded to the count of each item.

ML_21_1… ML_14_1…

ML_21_2… ML_14_1…

ML_21_3… ML_14_1…

ML_21_4… ML_14_1…

ML_21_5… ML_14_1…

ML_21_1… ML_14_1…

ML_21_1… ML_14_1…

ML_21_1… ML_14_1…

ML_21_1… ML_14_1…

ML_21_1… ML_14_1…

… 29 features … 21 features

Page 23: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Optimization

ML_1…(Composite)

ML_2… ML_3… ML_4… ML_14_3…

1 31469 1245 35 211 0

2 44617 1264 14 171 0

3 33247 1045 14 158 0

… … … … … …

1234 18828 682 29 222 1

… … … … … …

40000 1273048 844 19 151 0

• Treat ML_1… feature since it is dependent on other features.

• Treat features like ML_14_3… since categorical feature.

Page 24: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Results – very recent ones!

Page 25: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Spam run – from Aug 21 to Aug 27

{ "desc": "Shell call", "artifact": " Shell \"Explorer.exe \" & strCommande, vbNormalFocus, "

},

Copyright © Symantec

Page 26: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Just this morning … 15 Sep 2017

Page 27: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Recently captured…

{

"desc": "Small routine with string manipulation",

"artifact": " Chinook = (AscB(Sumatran_Rhinoceros))"

"artifact": " Tapir = Chinook(Mid(Sand_Lizard, Chipmunk, 1)) - Int(M..."

},

{

"desc": "Small routine with run & Obfuscated object concat & Obfuscated object creationarguments shell & Createobject run one-liner",

"artifact": " CreateObject(Pig + \"Shell\").Run Module1.Ibis(Sea_Dragon, \""

},

Page 28: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

{

"desc": "Obfuscated object variable",

"artifact": "Set miLxhuTjOMrpjvLQQNhstoiWlCkOdozYkasyizjweDRGlKRkgtkgxHZyAoLfJFFaMSFJDNiRekNpWbkbkzhjETbcAtytnDmZxruTFIhTLSCM = CreateObject(ujcYEkvJXWWtqcIKOpdaxorehRVbSNYlQPiQQao"

},

{

"desc": "Obfuscated object creation arguments",

"artifact": "Set qvBvooYSTaFymchvnZIkLUSrhheHIwfYCSyrpgvjePoCKWbhMYoOBOJVcKO = CreateObject(kbUBGIKqbHJyTmAmPbuHSBjqouVxfwCfSfEWfcNXxXYAhCJKXcegnoejsdNMnNKeFdfnieGnOXJvcjJlkKZDSV"

},

{

"desc": "Long obfuscated variable assignment",

"artifact": "ZGwEiLSTkOsQSFcFzZVPMMuHalgKESzgWlohddzbmveToRIxzt"

},

Page 29: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

{

"desc": "Macro with constant manipulation in function call",

"artifact": "dNDfJESUPztgDlcNnWNZLIPsGgXDVndgUDYaarDOIWeCVstlSACjSVcUyLZ = CWvXJUNlxQcbDqNtnmQhCsifqGFBSHE$(327 - 240) & CWvXJUNlxQcbDqNtnmQhCsifqGFBSHE$(324 - 241) & CWvXJUNl…"

},

{

"desc": "Highly random long string found",

"artifact": "mRClEXzmRGxUqDPLJHcHeEMgjtqozQbuXXYIpdNJOtykVB"

},

{

"desc": "Object creation variable identifier",

"artifact": "qvBvooYSTaFymchvnZIkLUSrhheHIwfYCSyrpgvjePoCKWbhMYoOBOJVcKO"

},

{

"desc": "Random subroutine name",

"artifact": "dnHLjlClNBEYNnZihnFPOighaDbyTOUim"

},

Page 30: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

{

"desc": "Random identifier with suspicious assignments",

"artifact": "ujcYEkvJXWWtqcIKOpdaxorehRVbSNYlQPiQQaoCIdBbVAdczWFVpbOGsxrmOTqKykcaurtoAaRUmQJgntcvICwoBcYTiBopmrckXChHdQUOKtTcnKzV = Chr$(327 - 240) & Chr$(324 - 241) & Chr$(24…"

},

{

"desc": "Shell/SaveToFile string contains strange variable name",

"artifact": "RhIzeRHLbzssvNwesaErYKfXuynMPZjWdUBgPAZZUnlhknaNjNAQERoHClFgeuvBPWPbMQPsAeXlYymHXZdCZTRMfteev"

},

{

"desc": "File with following name was created and run created",

"artifact": "XABNAGkAYwByAG8AcwBvAGYAdAA=XABxAGIASwBWAEsAdgBsAGgAdwBpAEoAUgBLAC4AZQB4AGUA"

},

And… we capture a lot more!

Page 31: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Copyright © 2017 Symantec Corporation SYMANTEC PROPRIETARY– Limited Use Only

Findings & Going Forward …

• “If an artifact is missing” means a sample is missed – not anymore

• All features contribute to the verdict in unison

• Obfuscation is still a challenge and will remain to be one

• Identify why a variable of string type is assigned a byte array?

• Why an assignment expression is more than say 200 characters?

• Keep transitioning inflating malware samples from sandbox to static analysis

Page 32: Talha Obaid, Email Security, Symantec at MLconf ATL 2017

Thank You!

Talha Obaid Ling Zhou Timothy You Xinlei Cai

Email Security

Join us! www.symantec.com/about/careers