Fault and Intrusion Tolerant (FIT) Event Broker & BFT- SMaRt
description
Transcript of Fault and Intrusion Tolerant (FIT) Event Broker & BFT- SMaRt
Fault and Intrusion Tolerant (FIT) Event Broker
& BFT-SMaRt
A. Casimiro, D. Kreutz, A. Bessani,J. Sousa, I. Antunes, P. Veríssimo
University of Lisboa, Portugal
Meeting PT, November 27, 2012
2
Cloud Infrastructures
SAN
VPN
IP Network
Monitoring Tools and Control Engines
Processing farmStorage farm
Switching andRouting
Control
Even
ts
Events
Contro
l
Events
Control
Contro
lEvents
Alert! Cloud infrastructures are one of the new hot targets of attacks!
Meeting PT / November 27, 2012
3
Example scenario:Portugal Telecom Cloud Computing Infrastructure SmartCloud product First and main problem:
Centralized monitoring approach Diversity of monitoring tools
ArchSight, Pulse, SCOM
Meeting PT / November 27, 2012
Agentless
Agent-BasedAgent with ArchSight
ArcSight(engine)
Mon
itorin
g P
robe Events
EventsEvents
Events
Events
ArcSight or other
tool
Problems: (a) faults and attacks;(b) diversity is hard to
achieve in practice.
4
The TRONE approach
Fault and Intrusion Tolerant (FIT) Event Broker
Automated Failure Diagnosis
Multi-homing for fast reconfiguration
Meeting PT / November 27, 2012
SCTP ResourceManager
Replicated Brokers
FIT event brokerConsole
Router
Router
Cloud servers
FailurediagnosisSubscribe
1 2
3
Publish
1
2
3
5
FIT Event BrokerGoals and challenges
Overarching goals: To provide support for trustworthy and resilient monitoring of
cloud/datacenter infrastructures To achieve improved Quality of Protection without neglecting
Quality of Service (performance) needs
Some specific challenges: Deal with large flows of information (events) Support different kinds of events (e.g. different criticality) Low intrusiveness and easy integration
Meeting PT / November 27, 2012
6
FIT Event Broker Assumptions
System entities: Probes, event collectors/brokers, consoles Some event processing may be done by collectors
Fully connected network E.g., all the entities lie in the same monitoring VLAN
Partially synchronous system Clocks may be used to timestamp events
Faults Some FIT brokers may crash or fail in a Byzantine way We do not require/enforce clients (probes/consoles) to be correct
If this is a problem for monitoring, then it must also be solved
Meeting PT / November 27, 2012
7
FIT Event Broker Baseline design options
Topic-based Publish-Subscribe paradigm Good fit to considered scenarios
State Machine Replication Active replication is better for Byzantine fault tolerance f out of n replicas of a FIT Broker may fail in a Byzantine way
Public-key cryptography Client authentication, avoid attacks from malicious probes
Event channels with support for QoP and QoS Differentiated fault-tolerance support (e.g. crash only or BFT)
Meeting PT / November 27, 2012
8
FIT Event Broker High level architectural view
Meeting PT / November 27, 2012
9
FIT Event BrokerInterface
Meeting PT / November 27, 2012
Create event channelIn: TAG and CLASS
Destroy event channelIn: TAG
Register to channelIn: TAG
Publish eventIn: EVENT
Subscribe to channelIn: TAG
Receive eventOut: EVENT
10
FIT Event BrokerInternal state
From the SMR perspective, it is important to identify the relevant state that needs to be maintained consistent across replicas Data related to the broker configuration
Existing channels and their CLASS Registered publishers and subscribers
Data related to events Events that are ready to be delivered
Agreement protocol
TAG SUBSCRIBER STATUS
T1 S1, S2 OK
T2 S3 OK
S1
S2
S3
Subscription Table
Output queues
TAG-based Filter
All client input that affects the state of the FIT broker state (e.g. channel and subscription data, some events) must be handled as a state machine command
Meeting PT / November 27, 2012
11
BFT-SMaRtOverview
Java-based platform for BFT SMR, available at http://code.google.com/p/bft-smart/
Actively being developed and improved in our group BFT SMR “common” features
State machine programming model n ≥ 3f+1 replicas required A small step away from being a commercial product
Advanced features Replica recovery (state transfer) Reconfigurations Extensible API: e.g. custom voter
Meeting PT / November 27, 2012
12
BFT-SMaRtService invocation
Meeting PT / November 27, 2012
PROBE
FIT Broker state Agreement on orderperformed by SMaRt
13
BFT-SMaRtExecution and response
Meeting PT / November 27, 2012
Commands are delivered to the FIT broker, which updates the state/queues
and replies Voting on clientside
14
The FIT Broker is currently being implemented…
…and integrated with BFT-SMaRt
Evaluation: Throughput
Aim is to deal with 40K events/sec Resilience
Measure performance under attack Verify recovery and reconfiguration
capabilities
A simple demo is available
Meeting PT / November 27, 2012
BFT-SMaRtImplementation & Evaluation
SMaRtSMaRtSMaRtSMaRt
ServiceProxy
ServiceProxy
ServiceProxyObject.invoke
ServiceProxyObject.invoke
FIT Broker Replica
Publisher
Subscriber
15
BFT-SMaRt Implementation & Evaluation
Preliminary results available [DAIS 2012]
Meeting PT / November 27, 2012
Throughput for up to 100 channels
16
Summary FIT Event Broker – Event dissemination support
For easier deployment of multiple monitoring tools Manage which events are propagated, to which consoles, with which QoS
BFT-SMaRT – Byzantine fault tolerant replication First usable implementation of BFT replication Leading edge worldwide Resilience against malicious attacks with small overhead
Portugal Telecom’s cloud infrastructure is being used as real use case for application and evaluation of the work
Meeting PT / November 27, 2012