A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

22
A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling Mahinthan Chandramohan, Tan Hee Beng Kuan, Lionel Briand, Shar Lwin Khin, and Bindu Madhavi Padmanabhuni Interdisciplinary Centre for ICT Security, Reliability, and Trust University of Luxembourg, Luxembourg School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore

description

ASE 2013 presentation on malware detection. Collaboration between NTU, Singapore and SnT Centre, Luxembourg

Transcript of A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Page 1: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

A  Scalable  Approach  for  Malware  Detec2on  through  Bounded  Feature  

Space  Behavior  Modeling  

Mahinthan Chandramohan, Tan Hee Beng Kuan, Lionel  Briand,  Shar Lwin Khin, and Bindu Madhavi Padmanabhuni

 

Interdisciplinary  Centre  for  ICT  Security,  Reliability,  and  Trust  

University  of  Luxembourg,  Luxembourg  

 

School  of  Electrical  and  Electronic  Engineering,    Nanyang  Technological  University,  Singapore  

Page 2: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

What  is  malware?    

     Malware  (malicious  +  soFware)  is  nothing  but  a  soFware  that  do  malicious  things  without  the  vicHm’s  knowledge  

Page 3: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Mo2va2on  

Ø More  than  403  million  new  malware  variants  were  created  in  2011,  a  41%  increase  over  2010.    

Ø On  average  around  55,000  new  malware  samples  were  reported  per  day.    

Ø ExponenHal  growth  of  malware  is  a  major  threat  in  the  soFware  industry  

Page 4: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Problem  Defini2on  1/2  

q New  malware  has  become  very  sophisHcated.  

q Malware  evade  tradiHonal  anH-­‐virus  signatures,  using  various  obfuscaHon  techniques.  

q Malware  authors  change  the  syntacHc  characterisHcs  (i.e.,  structure)  of  a  malicious  program  without  changing  its  semanHcs  (i.e.,  behavior)  

Page 5: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Problem  Defini2on  2/2  

q Scalability  is  a  major  problem  in  exisHng  behavior-­‐based  malware  detecHon  techniques  §  malware  feature  space  grows  in  proporHon  with  the  number  of  samples  under  examinaHon  

§  ComputaHonally  very  intensive  

Page 6: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Related  Work  1/2  

q PracHcality  and  efficiency  of  behavior  based  malware  detecHon  depends  on:    

•  size  of  feature  space,    •  computaHonal  complexity,    •  overheads  (e.g.,  pre-­‐processing)  •  detecHon  accuracy  

q Simple  malware  behavior  models  (e.g.,  n-­‐gram,  m-­‐bag  and  k-­‐tuple)  generate  huge  feature  spaces  and  require  various  pruning  and  parameter  tuning  mechanisms  

Page 7: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Related  Work  2/2  

q Complex  malware  behavior  models  (e.g.,  system  call  dependency  graphs)  are  highly  computaHonally  intensive  

Page 8: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Behavior  Modeling  –  An  Overview  

Ø SoFware  program  perform  ac#ons  on  various  operaHng  system  resources.  

Ø An  acHon  corresponds  to  a  higher-­‐level  operaHon  (e.g.,  reading  a  file)  composed  of  a  set  of  related  system  calls  (e.g.,  NtReadFile)  

Ø Advantage  of  using  acHons  over  system  calls  is  that  OS  may  use  different  names  for  system  calls  that  are  in  fact  serving  the  same  purpose    

Ø NtCreateProcess  and  NtCreateProcessEx    maps  to  CreateProcess  acHon  

Page 9: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Opera2ng  System  Resource  Types  

ü File  System  

ü Registry  

ü Process  and  Thread  

ü Network  

ü SynchronizaHon  

ü SecHon  

 

Page 10: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Bounded  Feature  space  behavior  Modeling  (BOFM)  

Malware  feature  For  each  type  of  OS  resource,  the  set  of  acHons  performed  by  malware  on  an  instance  of  the  OS  resource  type  concerned  consHtutes  a  feature  of  the  malware    

Ø Example:  Malware  performs,    CreateFile  and  DeleteFile  acHons  on  a  file  instance  C:\foo.exe,  and  DeleteFile  acHon  on  another  file  instance  C:\abc.dll  

 

This  malware  has  two  features,  {CreateFile,  DeleteFile}  and  {DeleteFile}    with  respect  to  file  resource  instances  C:\foo.exe  and  C:\abc.dll,  respecHvely.  

Page 11: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

ü  Goal:    To  be  more  resilient  to  commonly  used  obfuscaHon  techniques  

v Property  1:  Regardless  of  the  number  of  Hmes  an  acHon  is  performed  on  an  OS  resource  instance  it  is  considered  only  once  in  final  feature  set.    

E.g.,  ReadFile  acHon  is  performed  several  Hmes  on  a  file  instance  C:\Windows\...\sysfile2.dll;  this  behavior  is  modeled  by  a  BOFM  feature  {ReadFile}    

 

v Property  2:  The  sequence,  in  which  the  acHons  are  performed,  by  malware,  is  ignored  in  feature  construcHon.    

E.g.,  malware  features  {ReadFile,  QueryFileInforma9on}  and  {QueryFileInforma9on,  ReadFile}  are  considered  idenHcal.        

Proper2es  of  BOFM  features  1/2  

Page 12: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

v Property  3:  IdenHcal  acHon  sets  which  are  performed  on  two  different  OS  resource  instances  of  same  type  are  modeled  as  a  single  feature.    

E.g.,  acHons  CreateFile  and  DeleteFile  performed  on  two  different  file  resource  instances  C:\Windows\abc.dll  and  D:\Personel  \foo.exe  are  modeled  as  a  single  BOFM  feature  {CreateFile,  DeleteFile}    

 

Proper2es  of  BOFM  features  2/2  

Page 13: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Goal:  Avoid  malware  feature  space  growth  proporHonal  to  number  of  samples  under  examinaHon    

 

•  Lets  j  to  be  OS  resource  type,  where      •  Total  number  kj  of  possible  acHons  that  a  malware  may  

perform  on  an  OS  resource  instance  of  type  j  is  a  constant  •  Maximum  number  mj    of  possible  features  with  regard  to  OS  

resource  type  j  is  also  a  constant  

         Where,  •  Maximum  number  of  possible  features  N  for  all  resource  

types  is  always  the  following  constant  :  

Bounded  Feature  Space  

Page 14: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

OS  Resource  Types  and  Corresponding  Ac2ons  

Total  malware  features  (N)  extracted  from  these  six  OS  resources  is  16,652  

Page 15: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Model Construction Work Flow

Example  feature  vector              

Page 16: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Detec2on  Method  

Ø Machine  Learning  (ML)  classificaHon  techniques    used  for  building  Malware  DetecHon  models  

Ø LogisHc  Regression  (LR)  and  Support  Vector  Machine  (SVM)  are  used  in  our  experiments  

Ø Malware  detecHon  process  involves  two  phases  •  Phase  1:  model  building  phase    •  Phase  2:  model  evaluaHon  phase    

 

Page 17: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Experimental  Dataset  

ü   Training-­‐set  of  5000  malware  and  80  benign  samples  and  a  test-­‐set    of  300  malware  and  20  benign  samples  

Page 18: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Experimental  Results  

ü SVM  achieved  99.4%  detecHon  accuracy  with  no  false  posiHves  and  LR  achieved  99.6%  detecHon  accuracy  with  1%  FP  rate    

ü Balanced  test-­‐sets  consists  of  20  randomly  selected  (from  a  pool  of  300  samples)  malware  samples  and  the  20  benign  samples.  

ü For  balance  test-­‐sets  SVM  yielded  a  perfect  accuracy  of  100%  with  0%  FP  rate  and  LR  achieved  99.5%  detecHon  accuracy  with  1%  FP  rate.  

Page 19: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Comparison  with  Canali  et  al.  (ISSTA  2012)  

q   Both  achieve  99%  detecHon  accuracy  q However,    §  BOFM  generated  only  569  acHve  features  whereas  Canali  et  al.  generated  several  millions.  

§   It  took  1.67  hrs  to  extract  malware  features  using  BOFM  while  Canali  et  al.  took  around  48  hrs.  

§   It  took  26  seconds  to  train  the  SVM  classifier,  consuming  only  200MB  RAM.  Whereas,  Canali’s  approach  consumed  more  than  1GB  RAM  to  perform  signature  matching.  

§  BOFM  is  much  more  efficient  and  scalable  

Page 20: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Conclusion  ü  Malware  evade  tradiHonal  anH-­‐virus  signatures,  using  various  

obfuscaHon  techniques.  

ü  Behavior-­‐based  malware  detecHon  is  an  increasingly  common  soluHon  

ü  Scalability  is  a  major  problem  in  exisHng  behavior-­‐based  malware  detecHon  techniques  

ü  We  proposed  a  bounded  feature  space  malware  behavior  modeling  (BOFM)  technique  to  address  the  scalability  issue.  

ü  BOFM  entails  a  fixed  number  of  features  that  do  not  grow  in  proporHon  with  the  number  of  malware  samples  under  examinaHon  

ü  Benchmark:  BOFM  combined  with  SVM  achieved  100%  detecHon  accuracy,  within  less  than  a  minute  and  200  MB  of  memory  

Page 21: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Feature  Space  Analysis  

•  Comparison  of  malware  and  benign  feature  spaces  

•  57%  of  unique  malware  features  suggests  that  BOFM  is  a  promising  technique  to  model  the  malware  behavior    

Page 22: A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Brief  Analysis  of  Interes2ng  Features  

Ø ‘NoHfyChangeKey’  acHon  is  very  widely  used  by  malware  samples  compared  to  benign  samples  (86%  Vs.  15%).  

Ø ‘DeleteKey’  acHon  also  widely  used  by  malware  samples.  

Ø AcHons  such  as  ‘OpenFile’,  ‘GetFileAmributes’,  ‘CreateMutex’  and  ‘ReleaseMutex’  widely  appeared  in  both  malware  and  benign  samples.