LMA: Log Mail Analyzer Maurizio Aiello maurizio.aiello@ieiit.cnr.it National Research Council...

Post on 13-Jan-2016

213 views 0 download

Transcript of LMA: Log Mail Analyzer Maurizio Aiello maurizio.aiello@ieiit.cnr.it National Research Council...

LMA: Log Mail Analyzer

Maurizio Aiellomaurizio.aiello@ieiit.cnr.it

National Research CouncilInstitute of Electronics and Telecommunications and Information

Engineering (IEIIT)

http://sourceforge.net/lma

Free software project LMA: Log Mail Analyzer

What can be performed with Log File Analysys?– User’s request– Normal debugging operations– Help for worm detection

Why do we need a tool for log mail analysis?Mainly, avoiding headacheSpeeding up operation

Postfix architecture

Why are log files so complex?

– Modularity– Log = Debug– …

Interesting fields

What information do we need about an e-mail transaction?

Using hash QID (queue identifier) we retrieve value for each field above

Timestamp Ip client Mail From Rcpt to Status

Postfix :remote client to local user

E-mail translation

Retrieving info on a mail:

Find its QIDSearch lines related to that QIDReconstruct transaction (Local-Local, L-Remote, R-L, R-R)

LMA Module:Log-Translator

Output: info file (plaintext)

Architectural issue

Customization needs:– Network architecture– Antivirus server– ….

File conf:– Whitelisting– Network selection– DB format, server type

Database generation

To store e-mail transaction we support 2 options:

Transactional db: Mysql Berkeley DB

+ query flexibility+ engine power

+ LMA standalone program (no db engine required)

- need to install engine - need to build engine- engine power and flexibility

Dbgenerator module

With berkeleyDB we have to build db engine:

Database keys and values

Database Key Value

Mail_db E-mail_number (progressive integer)

Timestamp, ip, from, to, status

Date_db Timestamp

IP_db Ip address

Receiver_db “Rcpt to” recipient

Sender_db “mail from” sender

Sequence of e-mail_number

Database schema

Query engine and example

To search through DB, LMA performs the following:

Example: find all e-mails sent from aiello@ge.cnr.it:

1. search aiello@ge.cnr.it in Sender_db table2. obtain a list of integer which are keys in mail

table aiello@ge.cnr.it -> 27 | 45| 78| 3456| 8960 etc.3. retrieve all the data about each e-mail

27 ->01-Jan-2004|xxx.yyy.www.zzz|aiello@ge.cnr.it|jake@dot.com|250

Built-in query

List all e-mail sent with the following characteristics:

IP: from a particular IPFROM: with a given “mail from” fieldTO: to a particular recipientDATE: with ts_begin < timestamp < ts_final

Sysman & Debugging OK.

Security?

What about security?

Worms use “direct” method to spread, scanning ports and exploiting vulnerabilities, or

Use “indirect” way, for example using its own smtp engine or smtp server taken from User Agent settings.

Security aspects

PC is infected by an indirect worm: we expectLots of e-mail sent in a given time period;Different “mail from” field used by the same ip;Some abnormal mail repudiation by internet server.

LMA birth:awk ' BEGIN { FS="[" } /client=/ { print $3 } ' < mail.log | sed s/]// |

sort | uniq -c | sort -r

Another free project: Worm Poacher

Project with aim to:

• study behaviour of e-mail client

•Detect anomalies

•Take the appropriate countermeasure

Statistical data mining

Number of e-mails sent every 5m, 1h, 4h, 8h, 24h are calculated, plotted and analyzed

April 2004

0

200

400

600

800

1000

1200

1400

1 81 161 241 321 401 481 561 641

Time (h)

# e-

mai

ls

Baseline & statistichal

Visual inspections andBaseline threshold analysis and alert raising: Baseline =

Calculated subtracting “inactivity period”Correlation between different time_slice (5m, 1h

etc.) alerts to reduce false alarms.

Mail from

Normally, client pc use few Mail from fields. Some worms change this field (stealthyness)

Strange behaviour for a Pc?

80 different address in a day!

As before baseline calculated statistically for each ip.

Reject analysis

When a worm tries to spread fast, sometimes it chooses a random list of recipient (like jack@somedomain.com).

Probably a lot of these messages are rejected.

Baseline calculation and threshold analysis.

Kind of analysys performed

Global Flow Single ip flow

Number of e-mails sent

X X

Different mail from address

X X

Number of rejected mails

X X

Single ip flow analysis

Baseline calculated on each ip, instead of global trafficSingle ip flow useful in big network (where signal/noise ratio is low).Performance problem and architectural issue (impossible to perform with dhcp, shared pc etc.)

Results

Worm decision

Future development

Baseline dinamically updated

Alarms generated by daemon

SMTPsniffer. Reason: system independent from logfile format; can control any server.

Extension to ports different from 25.