Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability,...

32
Johns Hopkins & Purdue Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric Warfare Department of Computer Science Johns Hopkins University Yair Amir (PI), Claudiu Danilov, John Lane, Jonathan Shapiro, Ciprian Tutu Cristina Nita Rotaru Department of Computer Sciences Purdue University http://www.cnds.jhu.edu

Transcript of Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability,...

Page 1: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 1Approved for Public Release, Distribution Unlimited

Scalability, Accountability and Instant Information Access for

Network Centric Warfare

Department of Computer ScienceJohns Hopkins University

Yair Amir (PI), Claudiu Danilov, John Lane, Jonathan Shapiro, Ciprian Tutu

Cristina Nita Rotaru

Department of Computer SciencesPurdue University

http://www.cnds.jhu.edu

Page 2: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 2Approved for Public Release, Distribution Unlimited

Network Centric Warfare Environments

• Wide area network settings.– C3I systems usually span large geographical

distances.– Communication between sites is conducted

over unreliable channels.

• Timely decisions based on available information.

• Required update semantics are not general in many cases

• Critical information is often not large.• Source uniqueness.

Page 3: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 3Approved for Public Release, Distribution Unlimited

Network Centric Warfare Environments

• Wide area network settings.• Timely decisions based on available

information.– Intermittent network connectivity

• Results in high latency for propagation and for consistent replication of updates.

– Decisions may have to be made promptly.• Based on the best currently available information.

• Required update semantics are not general in many cases

• Critical information is often not large.• Source uniqueness.

Page 4: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 4Approved for Public Release, Distribution Unlimited

Network Centric Warfare Environments

• Wide area network settings.• Timely decisions based on available

information.• Required update semantics are not general in

many cases.– Weaker update semantics may suffice.– Common operation picture:

• Commutative update semantics.• Timestamp resolution (most recent update wins).

• Critical information is often not large.• Source uniqueness.

Page 5: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 5Approved for Public Release, Distribution Unlimited

Network Centric Warfare Environments

• Wide area network settings.• Timely decisions based on available

information.• Required update semantics are not general in

many cases• Critical information is often not large.

– Compared with current hardware capabilities.• Location of friendly forces and enemy forces.• A few plans.

– Allows storing all updates throughout the duration of engagement (several months).

• Source uniqueness.

Page 6: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 6Approved for Public Release, Distribution Unlimited

Network Centric Warfare Environments

• Wide area network settings.• Timely decisions based on available

information.• Required update semantics are not general in

many cases• Critical information is often not large.• Source uniqueness.

– Every input (update) is initiated by one unique source.

Page 7: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 7Approved for Public Release, Distribution Unlimited

Network Centric Warfare Environments

• Wide area network settings.• Timely decisions based on available

information.• Required update semantics are not general in

many cases• Critical information is often not large.• Source uniqueness.

Page 8: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 8Approved for Public Release, Distribution Unlimited

Malicious Insider Threats

• The insider attack has traditionally been a primary threat to computer systems. ( http://csrc.nist.gov ).

• The explosion of the Internet made things worse: Insiders commit about 80% of all computer and Internet related crime (www.intergov.org) and CSI/FBI 2003 Computer Crime and Security Survey.

• Insiders: participants with legitimate access or those that bypassed the protection mechanisms and exhibit arbitrary (malicious) behavior.

Page 9: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 9Approved for Public Release, Distribution Unlimited

Dealing with Insider Threats

• Detection: use intrusion detection systems; however, they are not perfect (high false positives rate).

• Prevention: use access control, firewalls, proactive security; but vulnerabilities still exist (OS bugs, buffer overflow, cover channels, etc).

• Mitigation (tolerate/cope): use mechanisms that provide service to correct participants while under attack, even if several participants are compromised.

• The above methods do not exclude each other.

Page 10: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 10Approved for Public Release, Distribution Unlimited

Outline• Network centric warfare environments.• Peer Byzantine replication limitations.• Research approach.

– Scaling wide area intrusion tolerance replication via hierarchy

• Local Byzantine replication within sites.• Fault tolerant replication on the wide area.

– Client accountability.• Accountability graph.• Snapshots for fast regenerations.

– Exploiting application semantics.

• Next steps.• Technology transitioning.• Summary.

Page 11: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 11Approved for Public Release, Distribution Unlimited

A Distributed Systems Service

• Message-passing system.

• Clients issue requests to servers, then wait for answers.

• Replicated servers process the request, then provide answers to clients.

Server

Replicas 1 o o o2 3 3f+1

Clients

A site

Page 12: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 12Approved for Public Release, Distribution Unlimited

State Machine Replication

• Requests must be ordered in a consistent manner by all servers.

• Usually one server manages the ordering process based on information from the other participants, then informs everybody about what was decided.

• If the leader dies, a new leader must be selected to ensure progress.

• Benign faults: Paxos [Lam98,Lam01]: must contact f+1 out of 2f+1 servers and uses 2 rounds to allow consistent progress.

• Byzantine faults: BFT [CL99]: must contact 2f+1 out of 3f+1 servers and uses 3 rounds to allow consistent progress.

Page 13: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 13Approved for Public Release, Distribution Unlimited

A Replicated Server System

• Maintaining consistent servers [Sch90] :– To tolerate f benign faults, 2f+1

servers are needed.– To tolerate f malicious faults: 3f+1

servers are needed.

• Responding to read-only clients’ request [Sch90] :– If the servers support only benign

faults: 1 answer is enough.– If the servers can be malicious: the

client must wait for f +1 identical answers, f being the number of malicious servers.

Page 14: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 14Approved for Public Release, Distribution Unlimited

Peer Byzantine Replication Limitations

• Limited scalability due to multiple all-peer exchange.– 3-round all-peer exchange.

• Very costly on high latency wide area links.• Not very scalable.

• Strong connectivity is required.• Construct consistent total order.• Focus is solely on replica protection.

Page 15: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 15Approved for Public Release, Distribution Unlimited

Peer Byzantine Replication Limitations

• Limited scalability due to multiple all-peer exchange.

• Strong connectivity is required.– 2f+1 (out of 3f+1) to allow progress and f+1 to

get an answer.• Partitions are a real issue.• Clients depend on remote information.

– Bad news: Provably optimal.• We need to pay something to get something else.

• Construct consistent total order.• Focus is solely on replica protection.

Page 16: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 16Approved for Public Release, Distribution Unlimited

Peer Byzantine Replication Limitations

• Limited scalability due to multiple all-peer exchange.

• Strong connectivity is required.• Construct consistent total order.

– Agreement is achieved on the order of updates before applying them.

• Very useful - supports general update semantics.• Maybe sub-optimal for C3I applications that need only

commutative semantics.

• Focus is solely on replica protection.

Page 17: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 17Approved for Public Release, Distribution Unlimited

Peer Byzantine Replication Limitations

• Limited scalability due to multiple all-peer exchange.

• Strong connectivity is required.• Construct consistent total order.• Focus is solely on replica protection.

– Compromised clients can inject wrong (though valid) input through authorized channels.

• Wrong input will be consistently replicated to all servers.

Page 18: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 18Approved for Public Release, Distribution Unlimited

Local Byzantine Replication Within a Site

• No trust between participants in a site– A site acts as one unit that can only crash if the

assumptions are met.

• How to make sure that one server can not manipulate the order?– Threshold cryptography seems a good

direction.

• Use BFT-like [CL99, YMVAD03] protocols and threshold cryptography to guarantee that any valid message leaving the site is correct.

Page 19: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 19Approved for Public Release, Distribution Unlimited

Fault Tolerant Replication Engine

RegPrim

TransPrim

ExchangeStates

NonPrim

Construct

Trans Memb

ExchangeMessagesUn No

Last CPCLast

CPC

LastState

PossiblePrim

No Primor

Trans Memb

Recover

Trans Memb

Reg MembReg MembTrans Memb

Reg Memb

Reg MembUpdate

update (Red)Update (Yellow)Update (Green)

1a 1b ? 0

[AT02]

Page 20: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 20Approved for Public Release, Distribution Unlimited

Fault Tolerant Experiments over Wide-Area Network

• A real experimental network (CAIRN). • Was also modeled in the Emulab facility.

ISIPC

ISIPC4

TISWPC

ISEPC3

ISEPC

UDELPC

MITPC

38.8 ms1.86Mbits/sec

1.4 ms1.47Mbits/sec

4.9 ms9.81Mbits/sec

3.6 ms1.42Mbits/sec

100 Mb/s< 1ms

100 Mb/s<1ms

Virginia

Delaware

Boston

San Jose

Los Angeles

Page 21: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 21Approved for Public Release, Distribution Unlimited

Throughput Comparison (WAN)

050

100150

200250

300350

400

0 14 28 42 56 70 84 98 112 126 140

number of clients (7 replicas on wide area)

upda

te t

rans

actio

ns /

sec

ond

FT Replication Engine Upper bound 2PC

[ADMST02]

Page 22: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 22Approved for Public Release, Distribution Unlimited

Hierarchical Architecture

• Each site acts as a logical unit that can crash.• Fault-tolerant protocols between sites.

Server

Replicas 1 o o o2 3 3f+1

ClientsA site

Page 23: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 23Approved for Public Release, Distribution Unlimited

Hierarchical Architecture Details

ByzantineReplication

Fault TolerantReplication

OverSecure Spread

Server Replica 1

Wide area representative

ByzantineReplication

Fault TolerantReplication

OverSecure Spread

Server Replica 2

Wide area standby

ByzantineReplication

Fault TolerantReplication

OverSecure Spread

Server Replica 3f+1

Wide area standby

o o o

Wide area network

Local area network

Local SiteClients

ByzantineReplication

Mon

itorFault Tolerant

ReplicationOver

Secure Spread

Server Replica 1

Wide area representative

ByzantineReplication

Mon

itorFault Tolerant

ReplicationOver

Secure Spread

Server Replica 2

Wide area standby

ByzantineReplication

Fault TolerantReplication

OverSecure Spread

Server Replica 3f+1

Wide area standby

o o o

Wide area network

Local area network

Local SiteClients

Mon

itor

Page 24: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 24Approved for Public Release, Distribution Unlimited

Payment & Potential Gain• Protects against f Byzantine faults in each

site for the price of having 3f+1 replicas in every site.

• Box numbers / a total site compromise.

• Read queries are limited to the local site.

• On a network with diameter of 50 ms.– It takes at least 300 milliseconds to

complete 3 wide area round trips used by peer Byzantine replication methods.

– FT Replication engine was shown to be achieve 5 times the performance of 2PC.

• Goal– > factor of 3 compared with a peer system.

Page 25: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 25Approved for Public Release, Distribution Unlimited

Alternative Scalable Architecture

• Use physical trusted nodes assumed to be working under a weaker adversary: can crash and recover, but can not be compromised.

• Take advantage of the trusted nodes to run an optimized Byzantine replication algorithm, potentially reducing the number of rounds.

• Use protocols where communication over WAN only take place between trusted nodes, thus avoiding high-latency.

• Similar approaches: [CLNV02, Ver03, SurS03]

Page 26: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 26Approved for Public Release, Distribution Unlimited

What About Corrupted Clients?

• We can not detect corrupted clients without external information (can take advantage of detection mechanisms).

• Can we bring the system to a “clean” state if we have external information about compromised clients?

• Proposed solution: accountability graph.

A -DAG

Page 27: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 27Approved for Public Release, Distribution Unlimited

Client Accountability Graph

Client Update

Tim

e

• A direct acyclic graph of updates.

• Each update links to previous updates modifying data it read (causal predecessors).

Page 28: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 28Approved for Public Release, Distribution Unlimited

Client Accountability Graph

X

Clean update Corrupted update Suspicious update

Tim

e

Limits adversary power:• Adversary can inject

updates only as a compromised client.

• Once a compromised network avoids delivering an update, it cannot deliver causally following updates.

Useful for risk assessment.

Page 29: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 29Approved for Public Release, Distribution Unlimited

Enabling Fast Regeneration Using Snapshots

X

Most recent snapshot

Clean update Corrupted update Suspicious update

Tim

e

Periodic snapshots limit state regeneration calculation.

For our application domain, it seems feasible to maintain continuous information of a long period of time

Page 30: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 30Approved for Public Release, Distribution Unlimited

Overall Architecture

ByzantineReplication

Fault TolerantReplication

OverSecure Spread

A-DAG

Server Replica 1

Wide area representative

ByzantineReplication

Fault TolerantReplication

OverSecure Spread

A-DAG

Server Replica 2

Wide area standby

ByzantineReplication

Fault TolerantReplication

OverSecure Spread

A-DAG

Server Replica 3f+1

Wide area standby

o o o

Wide area network

Local area network

Local SiteClients

ByzantineReplication

Mon

itorFault Tolerant

ReplicationOver

Secure Spread

A-DAG

Server Replica 1

Wide area representative

ByzantineReplication

Mon

itorFault Tolerant

ReplicationOver

Secure Spread

A-DAG

Server Replica 2

Wide area standby

ByzantineReplication

Fault TolerantReplication

OverSecure Spread

A-DAG

Server Replica 3f+1

Wide area standby

o o o

Wide area network

Local area network

Local SiteClients

Mon

itor

Page 31: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 31Approved for Public Release, Distribution Unlimited

Risks and Challenges• Interface the Byzantine-tolerant replication and

Fault-tolerant replication components. • Investigate the impact of threshold digital

signatures on performance and complexity. • Interface Byzantine-tolerant replication with the

client accountability graph.• Use of application semantics to optimize

protocols. • Design optimizations to make the cost of the

architecture very small when no faults occur. • Take into account confidentiality under

corrupted servers model.

Page 32: Johns Hopkins & Purdue 1 Approved for Public Release, Distribution Unlimited Scalability, Accountability and Instant Information Access for Network Centric.

Johns Hopkins & Purdue 32Approved for Public Release, Distribution Unlimited

Impact

New ideas

Scalability, Accountability and Instant Information Access forNetwork-Centric Warfare

ScheduleResulting systems with at least 3 times higher throughput, lower latency and high availability for updates over wide area networks. Clear path for technology transitions intoMilitary C3I systems.

http://www.cnds.jhu.edu/funding/srs/

June 04

Dec 04

June05

Dec 05

C3I model, baseline and demo

Componentanalysis & design

ComponentImplement.

System integration & evaluation

Final C3I demoand baseline eval

First scalable wide-area intrusion-tolerant replication architecture.

Providing accountability for authorized but malicious client updates.

Exploiting update semantics to provide instant and consistent information access.

Comp.eval.