Outline

40
Secure Semantic Information Grid for Network Centric Operations Dr. Bhavani Thuraisingham Principal Investigator The University of Texas at Dallas [email protected] June 2009

description

Secure Semantic Information Grid for Network Centric Operations Dr. Bhavani Thuraisingham Principal Investigator The University of Texas at Dallas [email protected] June 2009. Outline. Objectives Tasks/Team Members History Accomplishments Directions. Objectives. - PowerPoint PPT Presentation

Transcript of Outline

Page 1: Outline

Secure Semantic Information Grid for Network Centric Operations

Dr. Bhavani Thuraisingham

Principal Investigator

The University of Texas at Dallas

[email protected]

June 2009

Page 2: Outline

OutlineObjectivesTasks/Team MembersHistoryAccomplishmentsDirections

Page 3: Outline

ObjectivesDevelop technologies for secure information grid to support the DoD’s Network Centric OperationsComplement the current projects funded by AFOSR for handling large amounts of data setsDependable information sharingDual-use technologies

Page 4: Outline

Tasks/Team Members Policy based accountability for secure grids

Purdue University (Elisa Bertino, Lorenzo Martino)

Secure pervasive infrastructures University of Texas at Arlington (Sajal Das, Yonghe Liu)

Secure Distributed Storage University of Texas at Dallas (I-Ling Yen)

Encrypted Data Storage University of Texas at Dallas Murat Kantarcioglu

Secure Query Processing in Clouds University of Texas at Dallas (Latifur Khan)

Other Secure Geosocial Information Grid

University of Texas at Dallas (Latifur Khan, Murat Kantarcioglu) Dependability Issues

University of Texas at Dallas (Latifur Khan, Kevin Hamlen, Eric Wong)

Research and Integrated Demo (FY10 and Beyond) Administrative Assistant

Ms Jamie McDonald, The University of Texas at Dallas

Page 5: Outline

Recent History

Assured Information Sharing AFOSR MURI (Tim Finin)UMBC/Purdue/UTD/UIUC/UTSA/UM

Information Operations through Infospheres:Assured Information SharingAFOSR 2005-2008

Semantic FrameworkWith BlackbookIARPA

Secure Information Grid(Congressional Funds)

Policy basedSemantic WebNSF

DependableInformation SharingIARPA

Open Science GridDOE/NSF(Planned)

Page 6: Outline

Accomplishments

UTDallasUTArlington

Purdue

Page 7: Outline

Layered Architecture

Secure DATA/INFORMATION/KNOWLEDGE

GRID UTD/Purdue

GRID APPLICATIOINS: Border security UTA/UTD

GRID APPLICATIONS: NCES (UTD/Purdue)

SECURE GRID MODELS: access control and accountability Purdue/UTD

Secure INFRASTRUCTURE/MOBILE GRID UTA/UTD

Integration

UTD Lead with

others having a

role

CORE:

Policies

SSE -SOA

Security

Infrastructure

Assured

Information

Management Services

Assured

Federation

Management

Service

Assured

Incentive

Management

Service

ApplicationServices

S-SOAG

SecurityServices

InformationManagementServices

StorageServices

Assured

Information

Management Services

Infra-structureServices

ApplicationServices(e.g., NCW, Border Security)

CORE:

Policies

SSE -SOA

Security

Infrastructure

Assured

Information

Management Services

Assured

Federation

Management

Service

Assured

Incentive

Management

Service

ApplicationServices

S-SOAG

SecurityServices

InformationManagementServices

StorageServices

Assured

Information

Management Services

Infra-structureServices

ApplicationServices(e.g., NCW, Border Security)

Page 8: Outline

Secure Semantic Framework

Entity Extraction Entity Extraction, Relationship Extraction

BLACKBOOK

OtherServices

e.g., SecurityIntegrity

RDF Graph Store RDF Graph Store

RDF Graph Store Management

Storage, Query, Integration

Ontology based Heuristic ReasoningRule based reasoning, Data mining

RDF Graph Store

DocumentsDocuments

Page 9: Outline

Storing RDF Data in Hadoop And Retrieval

Dr. Latifur Khan

[email protected]

Page 10: Outline

Objectives/Environment• Objectives

– To build efficient storage using Hadoop for Peta-bytes of data

– To build an efficient secure query mechanism

– Possible outcomes

• Open Source Framework for RDF

• Integration with Jena

• Environment

– 4 node cluster in Semantic Web Lab

– 10 node cluster in Cloud Computing Lab

• 4 GB main memory

• Intel Pentium IV 3.0 GHz processor

• 640 GB hard drive

– OpenCirrus HP labs test bed

– Collaboration with Andy Seaborne, HP Labs

Page 11: Outline

Preprocessing Steps

Page 12: Outline

Some Query Results

Horizontal axis: Number of TriplesVertical axis: Time in milliseconds

Page 13: Outline

Design and Analysis of Querying Encrypted

Data in Relational Databases

Dr. Murat [email protected]

Page 14: Outline

Our Contributions• Storage

– Performance of cipher modes are analyzed under different granularity & disk access patterns

– CTR based page level encryption method is proposed

• Query– Performances are compared under different query types

– We propose the vertical partitioning approach to prevent unnecessary cryptographic operations over non-sensitive attributes

– First we focus on single table partitioning; Later on we generalize this problem to cover the entire schema and have proposed a heuristic to prevent exhaustive search space

14

Page 15: Outline

Threat Models

Hard DiskHard Disk

Client ApplicationClient Application

Query Results

Authentication and Query Transformation

Authentication and Query Transformation

Transformed Query Query Results

Disk Access

Trusted Components

Query EngineQuery Engine

Untrusted Component

Plain Query

15

Page 16: Outline

16

Counter (CTR4)

Page 17: Outline

Decrypting 1 GB data

CTR4 is faster than CBC, CTR, OFB, CFB

17

Block Size (byte) CBC CTR OFB CFB CTR 4

16 24.9 26.1 27.1 25.6 23.0

64 24.3 25.7 26.5 25.0 22.8

128 23.7 25.6 26.2 25.2 22.3

256 23.5 25.5 26.3 25.4 22.1

512 23.4 25.5 26.4 25.4 22.1

1024 23.4 25.5 26.4 25.3 22.0

2048 23.3 25.5 26.5 25.4 22.0

4096 23.3 25.5 26.3 25.3 22.2

8192 23.2 25.5 26.4 25.4 22.3

Page 18: Outline

Accountability Mechanisms for Grid Systems

Dr. Elisa BertinoResearch DirectorCERIAS

Computer Science Department Purdue University

[email protected]

Dr. Elisa BertinoResearch DirectorCERIAS

Computer Science Department Purdue University

[email protected]

Page 19: Outline

ContributionsWhat is accountability?

Accountability is defined as “A is accountable to B when A is obliged to inform B about A’s past or future actions and decisions, to justify them, and to suffer punishment in the case of eventual misconduct”Accountability is an important aspect of any computer system for assuring that every action executed in the system can be traced back to some entityThe dynamic and multi-organizational nature of grid systems requires effective and efficient accountability system

ContributionsWe have developed a distributed mechanism to capture provenance information available during the distributed execution of jobs in a gridOur approach is based on the notion of accountability agentsWe have developed a simple yet effective language to specify the accountability data to collectWe have implemented a prototype of the accountability system on an emulated grid testbed

Page 20: Outline

Overall Architecture of Accountable Grid SystemsOverall Architecture of Accountable Grid Systems

Page 21: Outline

Two approaches CombinedJob-flow based approach

Jobs flow across different organizational units

Long computations are often divided into many sub-jobs to be run in parallel

A possible approach is to employ point-to point agents which collect data at each node that the job traverses

Grid node based approach

It focuses on a given location in the flow and at a given instant of time for all jobs

Viewpoint is fixed

The combination of two approaches allows us to collect complementary information

Page 22: Outline

Accountability PoliciesExample A job is submitted to Purdue University SP and then assigned for execution to the RPs, A-state University, and B-state University. Purdue agrees to send job relation data (handle, job-id, subjob-id, RP-id, timestamp) to A-state and B-state when the processed job enters into active state. Additionally, A-state locally collects resource data (memory consumption, cpu time, network bandwidth, disk bandwidth) every day during the week.

The policies for such scenario are as follows:[at Purdue University] shared_policyPurdue := send_job_data (agent@Purdue, agents_in_job_relationPurdue, active, dataSetactive, job-id)collect_job_data (agent@Purdue, active, dataSetactive, DBPurdue)agents_in_job_relationPurdue := agent@A-state (AND) agent@B-statedataSetactive := handle (AND) job-id (AND) subjob-id (AND) RP-id (AND) timestamp[at A-state University] local_policyA-state := collect_resource_data (agent@A-state, dataSetlocal, time_constraintsA-state, DBA-state)dataSetlocal := memory consumption (AND) cpu time (AND) network bandwidth (AND) disk bandwidthtime_constraintsA-state := weekdays (AND) all.days

Page 23: Outline

Exp. 1 / Scalability with respect to the number of computing nodes

The response time is computed as the difference between the time at which the user receives the result and the time at which a user submits the job

Blue bars show the overhead introduced by accountability, which is negligible

2.83 1.33 4 2.3 2.5 2.8

0

50

100

150

200

250

300

Tim

e (

seconds)

4 10 20 40 80 100

Number of Nodes

Response Time

with accountability

without accountability

overhead

Experimental Evaluations

Job Submission

SP

HN

CN

RP RP

HN

CN CN

Page 24: Outline

A Framework for Pervasively Secure A Framework for Pervasively Secure Grid InfrastructureGrid Infrastructure

Sajal K. Das, Director

Center for Research in Wireless Mobility and Networking (CReWMaN)

Department of Computer Science and EngineeringThe University of Texas at Arlington

[email protected]

http://crewman.uta.edu

Page 25: Outline

Mobile / Pervasive Grid: A New ParadigmMobile / Pervasive Grid: A New Paradigm

• Next-generation information / knowledge Grid.

• Huge resource pool of laptops, mobile devices, and wireless sensors.

• A pervasive computing infrastructure of smart devices connected to the Grid across heterogeneous wireless networks and service providers.

• Context awareness (e.g., activity, user / device / node mobility) is the key.

• Applications: e-Learning, e-Health, banking, power grid, security, border control, disaster / crisis management, emergency response and rescue, …

Computational Grid

Data Grid

Grid CommunityWirelessAccessPoints

Page 26: Outline

• Objectives– Dynamic resource management in (wireless) pervasive grids to handle multi-

mission, often conflicting tasks– Development of multi-level security framework for high assurance information

sharing in pervasive grids– Context / situation-aware data collection, aggregation (fusion), and mining from

heterogeneous sensors, surveillance, monitoring, and tracking devices– Learning patterns via information fusion leading to anomaly detection, hence

potential security threats– Intelligent decision making in an integrated, adaptive, autonomous and

scalable manner for high information assurance, safety and security• Security Challenges

– Limited resource in wireless mobile devices and sensors Limited defense capability

– Uncertain, often unattended or hostile environment– Node compromises (insider attacks) revealed secrets– Post-deployment access and control is an issue– Lack of centralized control Potential loss of integrity and confidentiality due

to information fusion– Multiple-attacking angles Single level defense mechanism highly vulnerable

Research Objectives and ChallengesResearch Objectives and Challenges

Page 27: Outline

Research ContributionsResearch Contributions

Grid Community

UsersunderWAP1

UsersunderWAPp

WAP1

WAPp

Task Allocation

Task Allocation

Resource assignment

Resource

Novel game-theoretic framework for pervasive grid infrastructure that tracks node mobility and optimizes resource usage (e.g., wireless bandwidth, response time) for single and multi-class tasks

WAP: Wireless Access PointPool of (mobile) Users

Page 28: Outline

• Mathematical models and framework for multi-level security in pervasive grids (wireless sensor networks)

- distributed key management (among a cluster of nodes)

- secure information gathering, fusion, and routing

- modeling smart adversaries

- detection of compromised and replicated nodes

- controlling propagation of internal attacks

Compromised NodeReport false data

Infect other nodes

False routing info

Forge command

Selective packet dump

Discredit normal nodes

Research Contributions (cont’d)Research Contributions (cont’d)

Page 29: Outline

Experimental ResultsExperimental Results

High Bandwidth grid systems Medium Bandwidth grid systems

Low Bandwidth grid systems

PRIMOB: Pricing based Mobile algorithm (game)

OPTIM : Optimal algorithm COOP: Cooperative algorithm

Page 30: Outline

Compromise Process Modeling

Revoke Revealed Secrets

Detect Compromise

Self-Correct Tampered Data

Contain Outbreak

Purge Tampered Data

Epidemic Theory

Information Theory

Statistical Learning and Classification

Digital Watermarking

Trust / Reputation Model

Uncertainty CharacterizedResource Limited Pervasive Grid Environment

Architectural Components

Key Management

Secure Aggregation

Secure Routing

Highly Assured GridNetwork Operation

Node Compromise

Theoretical Foundations

DoS Defense

Topology Control

Multi-Level Security FrameworkMulti-Level Security Framework

Intrusion Detection

Page 31: Outline

Trust Model: ReputationTrust Model: Reputation Results

Case No.

Misbehaving time (%)

False data type

1 0 N/A

2 100 Obvious

3 100 Tricky

4 66 Obvious

5 66 Tricky

No malicious nodes all nodes’ reputation close to 1

Reputation of malicious nodes is significantly lower than legitimate ones

Reputation of malicious nodes is proportional to amount of true data they send

Attack Scenarios

Page 32: Outline

Secure, Highly Available, and High Performance Peer-to-Peer

Cloud Storage Systems

Dr. I-Ling Yen

The University of Texas at Dallas

Page 33: Outline

What is the good storage design for cloud environment?

• The role of the storage system in assured information sharing– Basis for availability, security, access performance assurance

• If the storage system is not available or does not offer good access performance, then the upper layer data applications cannot have high availability or good performance

• If some nodes in the storage system are compromised and the system cannot tolerate it, then the secure data are compromised

• Various storage system designs– Cluster based systems versus widely distributed cloud

environment– Replication and encryption based strategies and secret sharing

based strategies• Secret sharing, erasure coding, short secret sharing (SSS)

– Directory management

Page 34: Outline

Good storage design for cloud?

0.972

0.976

0.98

0.984

0.988

0.992

0.996

1

3 5 7 9 11 t

Se

cu

rity

RWE_WD

RWE_CB

SSS_WD

SSS_CB

0

1000

2000

3000

4000

5000

6000

0 1 2 3 x

Rea

d A

cces

s C

ost

RWE_WD

RWE_CB

TSSLE_WD

TSSLE_CB

SSS_WD

SSS_CB

0.6

0.7

0.8

0.9

1

-4 -3 -2 -1 x

Av

ail

ab

ilit

y RWE_WD

RWE_CB

SSS_WD

SSS_CB

m=40, t=5, pnf=0.01, variant x=log10(pef). (=0.01, σ=2*).

t x=log10(Data Size), m=40, t=5

Widely distributed solutions perform better In security and access costs: better In availability: almost the same

Page 35: Outline

Good storage design for cloud?

0.972

0.976

0.98

0.984

0.988

0.992

0.996

1

3 5 7 9 11 t

Se

cu

rity

RWE_WD

RWE_CB

SSS_WD

SSS_CB

0

1000

2000

3000

4000

5000

6000

0 1 2 3 x

Rea

d A

cces

s C

ost

RWE_WD

RWE_CB

TSSLE_WD

TSSLE_CB

SSS_WD

SSS_CB

0.6

0.7

0.8

0.9

1

-4 -3 -2 -1 x

Av

ail

ab

ilit

y RWE_WD

RWE_CB

SSS_WD

SSS_CB

m=40, t=5, pnf=0.01, variant x=log10(pef). (=0.01, σ=2*).

t x=log10(Data Size), m=40, t=5

SSS based storage performs better In security: same as secret sharing better than replication In availability: almost the same for all schemes In access cost: better than secret sharing mostly better than replication In storage cost: SSS and replication are similar

and better than secret sharing

Page 36: Outline

Major design issues

• Use widely distributed infrastructure and SSS• Directory management

– Specifically maintain directories• Costly in widely distributed systems• Needs additional directory access cost

– Use P2P solution: Design DHT for SSS

• Access protocols– In SSS, the consistency issue is much more important than that

in conventional storage systems• If a share is inconsistent, the reconstructed data is fully incorrect

– In SSS, the access protocol design is much more important than that in conventional storage systems

• One may keep on getting inconsistent shares• If only one share is maintained at each servers, then there may be

data losses since these shares may not be consistent

– Need to: Design efficient access protocols

Page 37: Outline

DHT for SSS• DHT for SSS

– Hash algorithm for SSS• d.si.identifier = (d.identifier + i*2m / n) % 2m

– d: data object– d.si: the ith share of d– 2m: the size of the identifier space for DHT

– Adopt One-Hop-Lookup • Reduced routing time

– Method to support accesses of shares from near-by servers• Each server stores the geographical locations and IPs of all other

servers– Use geographical distance to approximate the real latency

when no other information is available– Conducted experimental studies to understand the relationship

between geographical distance and ping latency and the potential error rate

Page 38: Outline

Efficient Access Protocols• Use version number to validate share consistency

– When updating, generate and attach a version number to shares– During retrieval, validate the consistency of shares– Problem: how to efficiently generate version numbers

• Fully decentralized approach: the new updates may not have the largest version number

• Centralized version server: bottleneck problem, single failure point– Solution: distributed version servers: For each data, use the

hash value of its first share to determine the version server• Very efficient, impact of failure is localized

• Maintain share history to avoid data losses– Problem

• E.g., (10, 5) sharing scheme, Current version number = 10• Client x updated 3 shares and failed (version no = 11)• Client y updated 4 shares and failed (version no = 12)• 3 shares with vn = 10, 3 shares with vn = 11, 4 shares with vn = 12

– Solution: Maintain multiple versions• Need to properly determine which share to retrieve • Need to have a share removal protocol

Page 39: Outline

Compare with other access methodsRead latency (versus k) Update latency (versus k)

Get version number from all servers

Our approach

Fully decentralized appraoch

Space cost

Page 40: Outline

DirectionsIntegrated demonstration

E.g., Accountability into Hadoop PrototypeHost prototype on secure infrastructureIntegrate with Blackbook

Feed results into other projectsAFOSR-MURI, IARPA

Participate in DOE/NSF Open Science GridTransfer technology to DoD/NCES