Tavolo 2 - Big Data Adaptive Monitoring

Tavolo 2 - Big DataAdaptive Monitoring

CIS, UNINA, UNICAL, UNIFI

Primi risultati

• I primi risultati del tavolo sono stati pubblicati nell’articolo

Big Data for Security Monitoring: Challenges Opportunities in Critical Infrastructures Protection

L. Aniello2, A. Bondavalli3, A. Ceccarelli3, C. Ciccotelli2, M. Cinque1, F. Frattini1, A. Guzzo4, A. Pecchia1, A. Pugliese4, L. Querzoni2, S. Russo1

(1) UNINA (2) CIS (3) UNIFI (4) UNICAL

• Presentato al workshop BIG4CIP @EDCC (12 maggio 2014)

Data-Driven Security Framework

CRITICAL INFRASTRUCTURE

DATA ANALYSIS

DATA PROCESSING

MONITORINGADAPTER

ATTACK MODELING

INVARIANT-BASEDMINING

BAYESIANINFERENCE

RAW DATACOLLECTION

…

KNOWLEDGE BASE

ENVIRONMENTAL DATANODE RESOURCE DATA

APPLICATION/SYSTEM LOGS

IDS ALERTSNETWORK AUDIT… PROTECTION

ACTIONS

FUZZY LOGIC

CONFORMANCE CHEKING

ADAPTIVE MONITORING

Scenario

• Problem:need to analyze more data coming from distinct sources in order to improve the capability to detect faults/cyber attackso Excessively large volumes of information to transfer and analyzeo Negative impact on the performance of monitored systems

• Proposed solution:dynamically adapt the granularity of monitoringo Normal case: coarse-grained monitoring (low-overhead)o Upon anomaly detection: fine-grained monitoring (higher overhead)

• Two distinct scenarioso Fault detection current CIS’s research directiono Cyber attack detection

Anomaly Detection

• Metrics Selectiono Find correlated metrics (invariants) to be used as anomaly signalso Learn which invariants hold when the system is healthy Profile the healthy behavior of the monitored system

• Anomaly Detectiono Monitor the health of the system by looking at a few metrics

How to choose these metrics?o When an invariant stops to hold, adapt the monitoring

The aim is detecting the root cause of the problem Possibility of false positives

[1] J., M., R., W., "Information-Theoretic Modeling for Tracking the Health of Complex Software Systems", 2008[2] J., M., R., W., " Detection and Diagnosis of Recurrent Faults in Software Systems by Invariant Analysis", 2008[3] M., J., R., W., "Filtering System Metrics for Minimal Correlation-Based Self-Monitoring", 2009

Adapt the Monitoring

• Two dimensions in adapting the monitoringo Change the set monitored metricso Change the frequency of metrics retrieval

• How to choose the way of adapting the monitoring on the basis of the detected anomaly?

• Additional issueo The goal of the adaptation is discovering the root cause of the problemo Need to zoom-in specific portions of the system

Very likely to increase the amount of data to transfer/analyze Risk to have a negative impact on system performance Possible solution: keep the volume of monitored data limited by zooming-out

other portions of the system

[4] M., R., J., A., W., "Adaptive Monitoring with Dynamic Differential Tracing-Based Diagnosis", 2008[5] M., W., "Leveraging Many Simple Statistical Models to Adaptively Monitor Software Systems", 2014

Fault Localization

• Goal: given a set of alerts, determine which fault occurred and which component originated it• Problemso A same alert may be due to different faults (Ambiguity)o A single fault may cause several alerts (Domino Effect)o Concurrent alerts may be generated by concurrent unrelated faultso Tradeoff: monitoring granularity vs precision of fault identification

• Approaches:o Probabilistic models (e.g. HMM, Bayesian Networks)oMachine learning techniques (e.g. Neural Networks, Decision Trees)oModel-based techniques (e.g., Dependency Graphs, Causality Graphs)

[6] S., S., "A survey of fault localization techniques in computer networks", 2004[7] D., G., B., C., "Hidden Markov Models as a Support for Diagnosis: ...", 2006

Prototype - Work in Progress

JBoss AS

Host #1

gmond

JBoss AS

Host #2

gmond

JBoss AS

Host #3

gmond

JBoss AS

Host #4

gmond

Adaptive Monitoring

Mon. Host

gmetad

monitored

metricsmonitored metrics

monitored metrics

monitoring adaptations

monitoring adaptations

monitoring of a JBoss cluster by using Ganglia

Prototype - Goals

• Identify a small set of metrics to monitor on a JBoss cluster to detect possible faultso Find existing correlationso Profile healthy behavior

• Inject faults on JBoss with Byteman (http://byteman.jboss.org/)

• For each fault, identify the set of additional metrics to monitor• Implement the prototype in order to evaluate

o The effectiveness of the approacho The reactivity of the adaptationo The overhead of the adaptation

http://byteman.jboss.org/

OPERATING SYSTEMS AND APPLICATION SERVERS MONITORING

Data collection and processing

• Collects a selection of attributes from OS and AS, through probes that have been installed on machines– Current implementation observes Tomcat 7 ad CentOS 6

• Executes the Statistical Prediction and Safety Margin algorithm on the data collected

• The CEP Esper is used to apply rules on events (performs the detection of anomalies)

• Work partially done within the context of the Secure! Project (see later today)

High level view

INVARIANTS MINING

Why invariants?

• Invariants are properties of a program that are guaranteed to hold for all executions of the program. – If those properties are broken at runtime, it is possible to

raise an alarm for immediate action

• Invariants can be useful to– detect transient faults, silent errors and failures – report performance issues– avoid SLAs violations – help operators to understand the runtime behavior of the

app

• Pretty natural properties for apps performing batch work

An example of flow intensity invariant

• A platform for the batch processing of files: the processing time is proportional to the file size

• Measuring the file size and the time spent in a stage, I(x) and I(y), (the flow intensities), the equation

is an invariant relationship characterising the expected behaviour of the batch system.– If there is an execution problem (e.g., file

processing hangs) the equation does not hold any more (broken invariant)

)()( xIkyI

Research questions

RQ1: how to discover invariants out of the hundreds of properties observable from an application log?

RQ2: How to detect broken invariants at runtime?

Our contribution

AUTOMATED MININGA framework and a tool for mining invariants automatically from application logs• tested on 9 months of logs collected from a real-world Infosys

CPG SaaS application • able to to automatically select 12 invariants out of 528 possible

relationships

IMPROVED DETECTIONAn adaptive threshold scheme defined to significantly shrink down the number of broken invariants• from thousands to tens broken invariants w.r.t. static thresholds

on our dataset

BAYESIAN INFERENCE

Data-driven Bayesian Analysis

• Security monitors may produce a large number of false alerts

• A Bayesian network can be used to correlate alerts coming from different sources and to filter out false notifications

• This approach has been successfully used to detect credential stealing attacks– Raw alerts generated during the progression of an attack

(e.g. user-profile violations and IDS notifications) are correlated

– The approach was able to remove around 80% of false positives (i.e., not compromised user being declared compromised) without missing any compromised user

Data-driven Bayesian Analysis

• Vector extraction starting from raw data:– each vector represents a security event, e.g., attack,

compromised user, etc…– suitable for post-mortem forensics and runtime analysis;– event logs, network audit, environmental sensors.

VECTOR EXTRACTION

v1 ✓ ✓

v2 ✓ ✓

v3 ✓ ✓ ✓ ✓

vN ✓ ✓ ✓

binary features (0 / 1)event

A14

hypothesis variable

information variables

(the user is compromised)

(alerts)unknownaddress

multiple loginssuspicious download…

Bayesian network

• Allows estimating the probability of the hypothesis variable (attack event), given the evidence in the raw data:

A2

A1

C

Network parameters a-priori probability P(C); conditional probability table (CPT)

for each alert Ai.

Incident analysis

• Estimate the probability that the vector represents an attack, given the features

V_i ✓ ✓ ✓

…✓ ✓

P(C)=0.31

vector

Preliminary testbed

• A preliminary implementation with Apache Storm

• Tested with synthetic logs emulating the activity of 2,5 million users, generating 5 millions log entries per day (IDS logs and user access logs)

Log lines Time (ms)

4.300.000 140.886

4.400.000 143.960

4.600.000 147.024

4.500.000 150.448

4.700.000 153.551

4.800.000 159.567

4.900.000 162.642

LogStreamer (spout)

FactorCompute (bolt)

AlertProcessor (bolt)

_________

Tavolo 2 - Big Data Adaptive Monitoring

Documents

Transcript of Tavolo 2 - Big Data Adaptive Monitoring