PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

34
PHMMs for Metamorphic Detection Mark Stamp 1 PHMMs for Metamorphic Detection

Transcript of PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

Page 1: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 1

PHMMs for Metamorphic Detection

Mark Stamp

Page 2: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 2

Viruses

Viruses and worms --- types of malware Various definitions are used For our purposes, “virus” used generically

How to detect malware? Signature detection used most often

In simplest form, search for a string of bits found in the malware

Could also include wildcards, heuristics, etc.

Page 3: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 3

Metamorphic Viruses

Metamorphic viruses change “shape” For each instance, internal structure changes But function stays the same If the change is sufficient, signature detection

fails In principle, metamorphic malware among

most difficult to detect But, not too many have been seen in the wild Why not???

Page 4: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 4

Metamorphic Detection

How to detect metamorphic malware?

Previous research: HMMs are effective Train model on opcodes extracted from

metamorphic “family” viruses Determine a threshold score Then, to score an unknown exe, extract

opcodes and score against the model

Page 5: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 5

Profile HMM

Standard HMM does not take positional information into account

Profile HMM analogous to defining HMM at each position in a sequence Position info is taken into account

So, PHMM uses more information This might yield stronger models

Page 6: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 6

PHMMs

Will PHMM outperform HMM? Possible advantage of PHMM

Uses more information… …since position within sequence is

taken into account Possible disadvantages of PHMM

More complex, more costly to compute Might overfit the data “More” is not always “better”

Page 7: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 7

The Plan

1. Extract opcodes from metamorphic family viruses

2. Pairwise align opcode sequences3. Generate multiple sequence

alignment (MSA) from pairwise alignments

4. Generate PHMM from MSA5. Determine threshold, error rates

Page 8: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 8

Metamorphic Techniques

Morphing usually applied at asm level Many techniques can be used, such as… Equivalent code substitution

Register swap Different code, same function

Garbage code/dead code insertion Code reordering

Subroutine reordering Arbitrary reordering using jumps

Page 9: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 9

Metamorphic Techniques

Opaque predicates “Conditional” that isn’t

By combining several techniques, can get achieve desired effect Metamorphism sufficient to break

signature detection Function of code remains unchanged

Page 10: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 10

Metamorphic Example Original code

Morphed version 2

Morphed version 1

Page 11: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 11

Metamorphic Viruses

Real-world metamorphic viruses

Page 12: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 12

Virus Construction Kits

Construction kits --- anyone can easily build (metamorphic) malware

First 2 are not very metamorphic But, NGVCK is highly metamorphic

So, we consider NGVCK here

Page 13: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 13

AV Techniques

Signature detection is most popular So, of course, virus writers want to

evade signature detection Metamorphism can provide strong

defense against signature detection

Page 14: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 14

HMMs

See previous presentation

Page 15: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 15

PHMMs

See previous presentation

Page 16: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 16

PHMMs

PHMMs are designed to deal with biological sequences

Goal is to find evidence that sequences related by mutation and selection

Basic processes usually considered are Substitution --- subsequence replaced Insertion --- subsequence inserted Deletion --- subsequence removed

Page 17: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 17

PHMMs and Computer Viruses

The same basic processes can occur in metamorphic viruses That is, substitution, insertion, deletion

But also have to deal with Permutation --- re-ordering of sequence Metamorphics may do lots of permuting

Permutation can be viewed as series of insertions/deletions But “close” sequences might be “far” apart

Page 18: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 18

Permutation and Alignment

Permutations are problematic…

How to deal with this? Maybe we can pre-process sequences

But, adds complexity and cost More about this later

Page 19: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 19

Test Data

Virus construction kits from VX Heavens

We generated the following viruses 10 VCL32 viruses 30 MS-MPC viruses 200 NGVCK viruses

Also, 40 cygwin utilities These serve as “normal” files

Page 20: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 20

NGVCK Pairwise Alignment

Align two NGVCK opcode sequences

This looks reasonable

Page 21: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 21

Gap Percentages

Recall, with PHMM, the more gaps, the weaker the model

MSAs for metamorphic viruses

But, VCL32 based on 5 files, PS-MPC based on 10, NGVCK based on 20 files

Page 22: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 22

VCL32

Using five VCL32 viruses… Generate pairwise alignments Generate MSA Then generate PHMM

PHMM has 1820 states Can’t show the whole model here

So, next slides give 3 states, 126,127,128

Page 23: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 23

VCL32 Transition Probabilities

State transition probabilities The A matrix for states 126,127,128

Page 24: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 24

VCL32 Emission Probabilities

Emission probabilities The E matrix States 126,127,128

Emissions only for match, insert states “Add-one” rule was

used here

Page 25: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 25

Results Typical PHMM results for VCL32

Can set threshold for 100% detection It doesn’t get any better than that!

Page 26: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 26

Results

Typical MS-MPC results using PHMM

Again, perfect detection

Page 27: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 27

Results

But, VCL32 and MS-MPC are easy cases Not very metamorphic Probably detectable using signatures

In contrast, NGVCK highly metamorphic

So, NGVCK is the important test See next slides

Page 28: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 28

Results

Typical results for NGVCK

Note that normal files score higher than NGVCK!

This is bad!

Page 29: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 29

Pre-Processing

For NGVCK, is there any hope? Can try pre-processing Goal is to undo some of the effect of

permutation Able to reduce gap percentage in

MSA Before, gap percentage was 88.3% After, gap percentage is 44.9% Big improvement, but is it big enough?

Page 30: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 30

Results

NGVCK with pre-processing Much better,

but not good enough

Error rate is still significant

Page 31: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 31

Conclusions

HMMs developed in 1960s Standard machine learning technique Many applications

PHMMs relatively recent Developed for biological applications Here, a novel application of PHMMs

100% detection for some examples… …poor detection for others

Page 32: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 32

Possible Improvements

Improved pre-processing To better account for permutation

Local alignment For example, align subroutines

Baum-Welch re-estimation of PHMM obtained from MSA

Other???

Page 33: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 33

Last Word

Very trendy to apply biological analogies to information security

On the one hand… Results here provide evidence supporting

trend of looking to biological analogies On the other hand…

Results here are “cautionary tale against applying biological analogies too literally”

Page 34: PHMMs for Metamorphic Detection Mark Stamp 1PHMMs for Metamorphic Detection.

PHMMs for Metamorphic Detection 34

References

Profile hidden Markov models for metamorphic virus detection, S. Attaluri, S. McGhee and M. Stamp, Journal in Computer Virology, Vol. 5, No. 2, May 2009, pp. 151-169

Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Durbin, et al