Outline

Secure Semantic Information Grid for Network Centric Operations

Dr. Bhavani Thuraisingham

Principal Investigator

The University of Texas at Dallas

[email protected]

June 2009

mailto:[email protected]

OutlineObjectivesTasks/Team MembersHistoryAccomplishmentsDirections

ObjectivesDevelop technologies for secure information grid to support the DoD’s Network Centric OperationsComplement the current projects funded by AFOSR for handling large amounts of data setsDependable information sharingDual-use technologies

Tasks/Team Members Policy based accountability for secure grids

Purdue University (Elisa Bertino, Lorenzo Martino)

Secure pervasive infrastructures University of Texas at Arlington (Sajal Das, Yonghe Liu)

Secure Distributed Storage University of Texas at Dallas (I-Ling Yen)

Encrypted Data Storage University of Texas at Dallas Murat Kantarcioglu

Secure Query Processing in Clouds University of Texas at Dallas (Latifur Khan)

Other Secure Geosocial Information Grid

University of Texas at Dallas (Latifur Khan, Murat Kantarcioglu) Dependability Issues

University of Texas at Dallas (Latifur Khan, Kevin Hamlen, Eric Wong)

Research and Integrated Demo (FY10 and Beyond) Administrative Assistant

Ms Jamie McDonald, The University of Texas at Dallas

Recent History

Assured Information Sharing AFOSR MURI (Tim Finin)UMBC/Purdue/UTD/UIUC/UTSA/UM

Information Operations through Infospheres:Assured Information SharingAFOSR 2005-2008

Semantic FrameworkWith BlackbookIARPA

Secure Information Grid(Congressional Funds)

Policy basedSemantic WebNSF

DependableInformation SharingIARPA

Open Science GridDOE/NSF(Planned)

Accomplishments

UTDallasUTArlington

Purdue

Layered Architecture

Secure DATA/INFORMATION/KNOWLEDGE

GRID UTD/Purdue

GRID APPLICATIOINS: Border security UTA/UTD

GRID APPLICATIONS: NCES (UTD/Purdue)

SECURE GRID MODELS: access control and accountability Purdue/UTD

Secure INFRASTRUCTURE/MOBILE GRID UTA/UTD

Integration

UTD Lead with

others having a

role

CORE:

Policies

SSE -SOA

Security

Infrastructure

Assured

Information

Management Services

Assured

Federation

Management

Service

Assured

Incentive

Management

Service

ApplicationServices

S-SOAG

SecurityServices

InformationManagementServices

StorageServices

Assured

Information

Management Services

Infra-structureServices

ApplicationServices(e.g., NCW, Border Security)

CORE:

Policies

SSE -SOA

Security

Infrastructure

Assured

Information

Management Services

Assured

Federation

Management

Service

Assured

Incentive

Management

Service

ApplicationServices

S-SOAG

SecurityServices

InformationManagementServices

StorageServices

Assured

Information

Management Services

Infra-structureServices

ApplicationServices(e.g., NCW, Border Security)

Secure Semantic Framework

Entity Extraction Entity Extraction, Relationship Extraction

BLACKBOOK

OtherServices

e.g., SecurityIntegrity

RDF Graph Store RDF Graph Store

RDF Graph Store Management

Storage, Query, Integration

Ontology based Heuristic ReasoningRule based reasoning, Data mining

RDF Graph Store

DocumentsDocuments

Storing RDF Data in Hadoop And Retrieval

Dr. Latifur Khan

[email protected]

Objectives/Environment• Objectives

– To build efficient storage using Hadoop for Peta-bytes of data

– To build an efficient secure query mechanism

– Possible outcomes

• Open Source Framework for RDF

• Integration with Jena

• Environment

– 4 node cluster in Semantic Web Lab

– 10 node cluster in Cloud Computing Lab

• 4 GB main memory

• Intel Pentium IV 3.0 GHz processor

• 640 GB hard drive

– OpenCirrus HP labs test bed

– Collaboration with Andy Seaborne, HP Labs

Preprocessing Steps

Some Query Results

Horizontal axis: Number of TriplesVertical axis: Time in milliseconds

Design and Analysis of Querying Encrypted

Data in Relational Databases

Dr. Murat [email protected]

Our Contributions• Storage

– Performance of cipher modes are analyzed under different granularity & disk access patterns

– CTR based page level encryption method is proposed

• Query– Performances are compared under different query types

– We propose the vertical partitioning approach to prevent unnecessary cryptographic operations over non-sensitive attributes

– First we focus on single table partitioning; Later on we generalize this problem to cover the entire schema and have proposed a heuristic to prevent exhaustive search space

14

Threat Models

Hard DiskHard Disk

Client ApplicationClient Application

Query Results

Authentication and Query Transformation

Authentication and Query Transformation

Transformed Query Query Results

Disk Access

Trusted Components

Query EngineQuery Engine

Untrusted Component

Plain Query

15

16

Counter (CTR4)

Decrypting 1 GB data

CTR4 is faster than CBC, CTR, OFB, CFB

17

Block Size (byte) CBC CTR OFB CFB CTR 4

16 24.9 26.1 27.1 25.6 23.0

64 24.3 25.7 26.5 25.0 22.8

128 23.7 25.6 26.2 25.2 22.3

256 23.5 25.5 26.3 25.4 22.1

512 23.4 25.5 26.4 25.4 22.1

1024 23.4 25.5 26.4 25.3 22.0

2048 23.3 25.5 26.5 25.4 22.0

4096 23.3 25.5 26.3 25.3 22.2

8192 23.2 25.5 26.4 25.4 22.3

Accountability Mechanisms for Grid Systems

Dr. Elisa BertinoResearch DirectorCERIAS

Computer Science Department Purdue University

[email protected]

Dr. Elisa BertinoResearch DirectorCERIAS

Computer Science Department Purdue University

[email protected]

ContributionsWhat is accountability?

Accountability is defined as “A is accountable to B when A is obliged to inform B about A’s past or future actions and decisions, to justify them, and to suffer punishment in the case of eventual misconduct”Accountability is an important aspect of any computer system for assuring that every action executed in the system can be traced back to some entityThe dynamic and multi-organizational nature of grid systems requires effective and efficient accountability system

ContributionsWe have developed a distributed mechanism to capture provenance information available during the distributed execution of jobs in a gridOur approach is based on the notion of accountability agentsWe have developed a simple yet effective language to specify the accountability data to collectWe have implemented a prototype of the accountability system on an emulated grid testbed

Overall Architecture of Accountable Grid SystemsOverall Architecture of Accountable Grid Systems

Two approaches CombinedJob-flow based approach

Jobs flow across different organizational units

Long computations are often divided into many sub-jobs to be run in parallel

A possible approach is to employ point-to point agents which collect data at each node that the job traverses

Grid node based approach

It focuses on a given location in the flow and at a given instant of time for all jobs

Viewpoint is fixed

The combination of two approaches allows us to collect complementary information

Accountability PoliciesExample A job is submitted to Purdue University SP and then assigned for execution to the RPs, A-state University, and B-state University. Purdue agrees to send job relation data (handle, job-id, subjob-id, RP-id, timestamp) to A-state and B-state when the processed job enters into active state. Additionally, A-state locally collects resource data (memory consumption, cpu time, network bandwidth, disk bandwidth) every day during the week.

The policies for such scenario are as follows:[at Purdue University] shared_policyPurdue := send_job_data (agent@Purdue, agents_in_job_relationPurdue, active, dataSetactive, job-id)collect_job_data (agent@Purdue, active, dataSetactive, DBPurdue)agents_in_job_relationPurdue := agent@A-state (AND) agent@B-statedataSetactive := handle (AND) job-id (AND) subjob-id (AND) RP-id (AND) timestamp[at A-state University] local_policyA-state := collect_resource_data (agent@A-state, dataSetlocal, time_constraintsA-state, DBA-state)dataSetlocal := memory consumption (AND) cpu time (AND) network bandwidth (AND) disk bandwidthtime_constraintsA-state := weekdays (AND) all.days

Exp. 1 / Scalability with respect to the number of computing nodes

The response time is computed as the difference between the time at which the user receives the result and the time at which a user submits the job

Blue bars show the overhead introduced by accountability, which is negligible

2.83 1.33 4 2.3 2.5 2.8

0

50

100

150

200

250

300

Tim

e (

seconds)

4 10 20 40 80 100

Number of Nodes

Response Time

with accountability

without accountability

overhead

Experimental Evaluations

Job Submission

SP

HN

CN

RP RP

HN

CN CN

A Framework for Pervasively Secure A Framework for Pervasively Secure Grid InfrastructureGrid Infrastructure

Sajal K. Das, Director

Center for Research in Wireless Mobility and Networking (CReWMaN)

Department of Computer Science and EngineeringThe University of Texas at Arlington

[email protected]

http://crewman.uta.edu

Mobile / Pervasive Grid: A New ParadigmMobile / Pervasive Grid: A New Paradigm

• Next-generation information / knowledge Grid.

• Huge resource pool of laptops, mobile devices, and wireless sensors.

• A pervasive computing infrastructure of smart devices connected to the Grid across heterogeneous wireless networks and service providers.

• Context awareness (e.g., activity, user / device / node mobility) is the key.

• Applications: e-Learning, e-Health, banking, power grid, security, border control, disaster / crisis management, emergency response and rescue, …

Computational Grid

Data Grid

Grid CommunityWirelessAccessPoints

• Objectives– Dynamic resource management in (wireless) pervasive grids to handle multi-

mission, often conflicting tasks– Development of multi-level security framework for high assurance information

sharing in pervasive grids– Context / situation-aware data collection, aggregation (fusion), and mining from

heterogeneous sensors, surveillance, monitoring, and tracking devices– Learning patterns via information fusion leading to anomaly detection, hence

potential security threats– Intelligent decision making in an integrated, adaptive, autonomous and

scalable manner for high information assurance, safety and security• Security Challenges

– Limited resource in wireless mobile devices and sensors Limited defense capability

– Uncertain, often unattended or hostile environment– Node compromises (insider attacks) revealed secrets– Post-deployment access and control is an issue– Lack of centralized control Potential loss of integrity and confidentiality due

to information fusion– Multiple-attacking angles Single level defense mechanism highly vulnerable

Research Objectives and ChallengesResearch Objectives and Challenges

Research ContributionsResearch Contributions

Grid Community

UsersunderWAP1

UsersunderWAPp

WAP1

WAPp

Task Allocation

Task Allocation

Resource assignment

Resource

Novel game-theoretic framework for pervasive grid infrastructure that tracks node mobility and optimizes resource usage (e.g., wireless bandwidth, response time) for single and multi-class tasks

WAP: Wireless Access PointPool of (mobile) Users

• Mathematical models and framework for multi-level security in pervasive grids (wireless sensor networks)

- distributed key management (among a cluster of nodes)

- secure information gathering, fusion, and routing

- modeling smart adversaries

- detection of compromised and replicated nodes

- controlling propagation of internal attacks

Compromised NodeReport false data

Infect other nodes

False routing info

Forge command

Selective packet dump

Discredit normal nodes

Research Contributions (cont’d)Research Contributions (cont’d)

Experimental ResultsExperimental Results

High Bandwidth grid systems Medium Bandwidth grid systems

Low Bandwidth grid systems

PRIMOB: Pricing based Mobile algorithm (game)

OPTIM : Optimal algorithm COOP: Cooperative algorithm

Compromise Process Modeling

Revoke Revealed Secrets

Detect Compromise

Self-Correct Tampered Data

Contain Outbreak

Purge Tampered Data

Epidemic Theory

Information Theory

Statistical Learning and Classification

Digital Watermarking

Trust / Reputation Model

Uncertainty CharacterizedResource Limited Pervasive Grid Environment

Architectural Components

Key Management

Secure Aggregation

Secure Routing

Highly Assured GridNetwork Operation

Node Compromise

Theoretical Foundations

DoS Defense

Topology Control

Multi-Level Security FrameworkMulti-Level Security Framework

Intrusion Detection

Trust Model: ReputationTrust Model: Reputation Results

Case No.

Misbehaving time (%)

False data type

1 0 N/A

2 100 Obvious

3 100 Tricky

4 66 Obvious

5 66 Tricky

No malicious nodes all nodes’ reputation close to 1

Reputation of malicious nodes is significantly lower than legitimate ones

Reputation of malicious nodes is proportional to amount of true data they send

Attack Scenarios

Secure, Highly Available, and High Performance Peer-to-Peer

Cloud Storage Systems

Dr. I-Ling Yen

The University of Texas at Dallas

What is the good storage design for cloud environment?

• The role of the storage system in assured information sharing– Basis for availability, security, access performance assurance

• If the storage system is not available or does not offer good access performance, then the upper layer data applications cannot have high availability or good performance

• If some nodes in the storage system are compromised and the system cannot tolerate it, then the secure data are compromised

• Various storage system designs– Cluster based systems versus widely distributed cloud

environment– Replication and encryption based strategies and secret sharing

based strategies• Secret sharing, erasure coding, short secret sharing (SSS)

– Directory management

Good storage design for cloud?

0.972

0.976

0.98

0.984

0.988

0.992

0.996

1

3 5 7 9 11 t

Se

cu

rity

RWE_WD

RWE_CB

SSS_WD

SSS_CB

0

1000

2000

3000

4000

5000

6000

0 1 2 3 x

Rea

d A

cces

s C

ost

RWE_WD

RWE_CB

TSSLE_WD

TSSLE_CB

SSS_WD

SSS_CB

0.6

0.7

0.8

0.9

1

-4 -3 -2 -1 x

Av

ail

ab

ilit

y RWE_WD

RWE_CB

SSS_WD

SSS_CB

m=40, t=5, pnf=0.01, variant x=log10(pef). (=0.01, σ=2*).

t x=log10(Data Size), m=40, t=5

Widely distributed solutions perform better In security and access costs: better In availability: almost the same

Good storage design for cloud?

0.972

0.976

0.98

0.984

0.988

0.992

0.996

1

3 5 7 9 11 t

Se

cu

rity

RWE_WD

RWE_CB

SSS_WD

SSS_CB

0

1000

2000

3000

4000

5000

6000

0 1 2 3 x

Rea

d A

cces

s C

ost

RWE_WD

RWE_CB

TSSLE_WD

TSSLE_CB

SSS_WD

SSS_CB

0.6

0.7

0.8

0.9

1

-4 -3 -2 -1 x

Av

ail

ab

ilit

y RWE_WD

RWE_CB

SSS_WD

SSS_CB

m=40, t=5, pnf=0.01, variant x=log10(pef). (=0.01, σ=2*).

t x=log10(Data Size), m=40, t=5

SSS based storage performs better In security: same as secret sharing better than replication In availability: almost the same for all schemes In access cost: better than secret sharing mostly better than replication In storage cost: SSS and replication are similar

and better than secret sharing

Major design issues

• Use widely distributed infrastructure and SSS• Directory management

– Specifically maintain directories• Costly in widely distributed systems• Needs additional directory access cost

– Use P2P solution: Design DHT for SSS

• Access protocols– In SSS, the consistency issue is much more important than that

in conventional storage systems• If a share is inconsistent, the reconstructed data is fully incorrect

– In SSS, the access protocol design is much more important than that in conventional storage systems

• One may keep on getting inconsistent shares• If only one share is maintained at each servers, then there may be

data losses since these shares may not be consistent

– Need to: Design efficient access protocols

DHT for SSS• DHT for SSS

– Hash algorithm for SSS• d.si.identifier = (d.identifier + i*2m / n) % 2m

– d: data object– d.si: the ith share of d– 2m: the size of the identifier space for DHT

– Adopt One-Hop-Lookup • Reduced routing time

– Method to support accesses of shares from near-by servers• Each server stores the geographical locations and IPs of all other

servers– Use geographical distance to approximate the real latency

when no other information is available– Conducted experimental studies to understand the relationship

between geographical distance and ping latency and the potential error rate

Efficient Access Protocols• Use version number to validate share consistency

– When updating, generate and attach a version number to shares– During retrieval, validate the consistency of shares– Problem: how to efficiently generate version numbers

• Fully decentralized approach: the new updates may not have the largest version number

• Centralized version server: bottleneck problem, single failure point– Solution: distributed version servers: For each data, use the

hash value of its first share to determine the version server• Very efficient, impact of failure is localized

• Maintain share history to avoid data losses– Problem

• E.g., (10, 5) sharing scheme, Current version number = 10• Client x updated 3 shares and failed (version no = 11)• Client y updated 4 shares and failed (version no = 12)• 3 shares with vn = 10, 3 shares with vn = 11, 4 shares with vn = 12

– Solution: Maintain multiple versions• Need to properly determine which share to retrieve • Need to have a share removal protocol

Compare with other access methodsRead latency (versus k) Update latency (versus k)

Get version number from all servers

Our approach

Fully decentralized appraoch

Space cost

DirectionsIntegrated demonstration

E.g., Accountability into Hadoop PrototypeHost prototype on secure infrastructureIntegrate with Blackbook

Feed results into other projectsAFOSR-MURI, IARPA

Participate in DOE/NSF Open Science GridTransfer technology to DoD/NCES

Outline

Documents

Transcript of Outline