HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because...

39
HUNTING FOR METAMORPHIC ENGINES Mark Stamp & Wing Wong August 5, 2006

Transcript of HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because...

Page 1: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

HUNTING FOR METAMORPHIC ENGINES

Mark Stamp&

Wing Wong

August 5, 2006

Page 2: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Outline

I. Metamorphic softwareII. Virus construction kitsIII. How “effective” are metamorphic engines?

Method used to compare two pieces of codeSimilarity within virus familiesSimilarity between virus families

IV. Can metamorphic viruses be detected?Commercial virus scannersHidden Markov models (HMMs)Similarity index

V. Conclusion

Page 3: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

PART I

Metamorphic Software

Page 4: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

What is Metamorphic Software?

Software is metamorphic providedAll copies do the same thingInternal structure of copies differs

Today most software is clonedWhy metamorphic?

Virus/worm avoids signature detectionIncrease “genetic diversity” of software

Page 5: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Genetic Diversity of Software?

Suppose a program has a buffer overflow If we clone the program

One attack works against every copyBreak once, break everywhere (BOBE)

If instead, we create metamorphic copiesEach copy still has a buffer overflowSame attack does not work against every metamorphic copyBreak once break everywhere (BOBE) resistanceSorta like genetic diversity in biology

Page 6: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Evolution of Virus

Viruses first appeared in the 1980s Fred Cohen

Viruses must avoid signature detectionVirus can alter its “appearance”

Techniques employedencryptionpolymorphic metamorphic

Page 7: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Evolution of Virus - Encryption

Virus consists ofdecrypting module (decryptor)encrypted virus body

Different encryption key different virus body signature

Weaknessdecryptor can be detected

Page 8: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Evolution of Virus – Polymorphic Viruses

Try to hide signature of decryptor Can use code emulator to decrypt putative virus dynamicallyDecrypted virus body is constant

Signature detection is possible

Page 9: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Evolution of Virus – Metamorphic Viruses

Change virus bodyMutation techniques:

permutation of subroutinesinsertion of garbage/jump instructionssubstitution of instructions

Page 10: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

PART II

Virus Construction Kits

Page 11: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Virus Construction Kits – PS-MPC

According to Peter Szor:“… PS-MPC [Phalcon/Skism Mass-Produced Code generator] uses a generator that effectively works as a code-morphing engine…… the viruses that PS-MPC generates are not [only] polymorphic, but their decryption routines and structures change in variants…”

Page 12: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Virus Construction Kits – G2

From the documentation of G2(Second Generation virus generator):

“… different viruses may be generated from identical configuration files…”

Page 13: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Virus Construction Kits - NGVCK

From the documentation of NGVCK (Next Generation Virus Creation Kit):

“… all created viruses are completely different in structure and opcode……impossible to catch all variants with one or more scanstrings.…… nearly 100% variability of the entire code”

Page 14: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

PART III

How Effective Are Metamorphic Engines?

Page 15: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Method to Compare Two Pieces of Code

Opcode sequences Score

0 call1 pop2 mov3 sub

… m-1 m-1… score =

n-1 jmp average % match

0 push 0 n-1 0 n-11 mov2 sub3 and

……

m-1 retn

Program X

Graph of real matches

Prog

ram

Y

Prog

ram

Y

(lines with length > 5)(matching 3 opcodes)Assembly programs

Program X

Graph of matches

Program X

Program Y

Page 16: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Similarity within Virus Families –Test Data

Four generators, 45 viruses20 viruses by NGVCK10 viruses by G210 viruses by VCL325 viruses by MPCGEN

20 normal utility programs from the Cygwin DLL

Page 17: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Similarity within Virus Families –Test Result

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 50 100 150 200

Comparison number

Sim

ilarit

y sc

ore

NGVCK virusesNormal files

Page 18: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Similarity within Virus Families –Test Result

Size of bubble = average similarity

NGVCK

G2VCL32 MPCGENNormal

0

0.2

0.4

0.6

0.8

1

1.2

-0.2 0 0.2 0.4 0.6 0.8

Minmum similarity score

Max

imum

sim

ilarit

y sc

ore

NGVCKG2VCL32MPCGENNormal

Page 19: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Similarity within Virus Families –Test Result

IDA_

NGVCK0- IDA_

NGVCK8 (11.9%)

IDA_G4- IDA_G7 (75.2%)

Page 20: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Similarity within Virus Families –Test Result

IDA_VCL

0- IDA_VCL

9 (60.2%)

IDA_MPC

1- IDA_MPC

3 (58.0%)

Page 21: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Similarity among Virus Families

NGVCK versus other viruses0% similar to G2 and MPCGEN viruses0 – 5.5% similar to VCL32 viruses (43 out of 100 comparisons have score > 0)0 – 1.2% similar to normal files (only 8 out of 400 comparisons have score > 0)

Page 22: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Similarity among Virus Families

NGVCKHighest degree of metamorphism of kits testedVirtually no similarity to other viruses or normal programs

Page 23: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

PART IV

Can Metamorphic Viruses Be Detected?

Page 24: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Detection with Commercial Virus Scanners

Tested three virus scannerseTrust version 7.0.405 avast! antivirus version 4.7AVG Anti-Virus version 7.1

Each scanned 37 files10 NGVCK viruses 10 G2 viruses 10 VCL32 viruses 7 MPCGEN viruses

Page 25: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Detection with Commercial Virus Scanners

ResultseTrust and avast! detected 17(G2 and MPCGEN)AVG detected 27 viruses (G2, MPCGEN and VCL32)none of NGVCK viruses detected

Page 26: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Detection with Hidden Markov Models (HMMs)

Use hidden Markov models (HMMs) to represent statistical properties of a set of metamorphic virus variants

Train the model on family of metamorphic viruses Use trained model to determine whether a given program is similar to the viruses the HMM represents

Page 27: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Detection with HMMs – Theory

A trained HMMmaximizes the probabilities of observing the training sequenceassigns high probabilities to sequences similar to the training sequencerepresents the “average” behavior if trained on multiple sequencesrepresents an entire virus family, as opposed to individual viruses

Page 28: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Detection with HMMs – Data Used

Data set200 NGVCK viruses

Comparison set40 normal exes from the Cygwin DLL25 other “non-family” viruses (G2, MPCGEN and VCL32)

Many HMM models generated and tested

Page 29: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Detection with HMMs – Experimental Result

Test set 0, N = 2

-160

-140

-120

-100

-80

-60

-40

-20

00 10 20 30 40

File number

Scor

e (L

LPO

)

family virusesnormal files

Page 30: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Detection with HMMs – Experimental Result

Detect some other viruses “for free”

Test set 0, N = 3

-180-160

-140-120-100

-80-60-40

-200

0 10 20 30 40

File number

Scor

e (L

LPO

)

familyviruses

non-familyviruses

normalfiles

Page 31: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Detection with HMMs – Experimental Result

Summary All normal programs distinguishedVCL32 viruses had scores close to NGVCK family virusesWith proper threshold, 17 HMM models had 100% detection rate and 10 models had 0% false positive rateNo significant difference in performance between HMMs with 3 or more hidden states

Page 32: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Detection with HMMs – The Trained Models

Converged probabilities in HMM matrices may give insight into the features of the viruses it representsWe observed

opcodes grouped into statesmost opcodes in one states only

What does this mean?We are not sure…

Page 33: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Detection with Similarity Index

Straightforward similarity indexapproach

To determine whether a program belongs to the NGVCK virus family, compare it to any randomly chosen NGVCK virusSimilarity to non-NGVCK code is smallCan use this fact to detect metamorphic NGVCK variants

Page 34: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Detection with Similarity Index

Experiment compare 105 programs to selected NGVCK virus

Results100% detection, 0% false positive

Same results using other NGVCK virus

Page 35: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

PART V

Conclusion

Page 36: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Conclusion

Metamorphic generators vary greatly

NGVCK has highest metamorphism (10% similarity on average)Other generators far less effective (60% similarity on average)Normal files 35% similar on average

HoweverNGVCK viruses are “too different” from other viruses and normal programs

Page 37: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Conclusion

NGVCK viruses not detected by commercial scanners we tested Hidden Markov model (HMM) detects NGVCK (and other) viruses with high accuracyNGVCK viruses also detectable by similarity index

Page 38: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

Conclusion

All viruses tested were detectable because

High similarity within family and/orToo different from normal programs

Effective use of metamorphism requires both

A high degree of metamorphism and Some similarity to other programs

Page 39: HUNTING FOR METAMORPHIC ENGINES - DEF CON · Conclusion {All viruses tested were detectable because zHigh similarity within family and/or zToo different from normal programs {Effective

References

P. Szor, The Art of Computer Virus Research and Defense, Addison-Wesley, 2005M. Stamp, Information Security: Principles and Practice, WileyInterscience, 2005