Outline
-
Upload
aurelia-byers -
Category
Documents
-
view
31 -
download
0
description
Transcript of Outline
Secure Semantic Information Grid for Network Centric Operations
Dr. Bhavani Thuraisingham
Principal Investigator
The University of Texas at Dallas
June 2009
OutlineObjectivesTasks/Team MembersHistoryAccomplishmentsDirections
ObjectivesDevelop technologies for secure information grid to support the DoD’s Network Centric OperationsComplement the current projects funded by AFOSR for handling large amounts of data setsDependable information sharingDual-use technologies
Tasks/Team Members Policy based accountability for secure grids
Purdue University (Elisa Bertino, Lorenzo Martino)
Secure pervasive infrastructures University of Texas at Arlington (Sajal Das, Yonghe Liu)
Secure Distributed Storage University of Texas at Dallas (I-Ling Yen)
Encrypted Data Storage University of Texas at Dallas Murat Kantarcioglu
Secure Query Processing in Clouds University of Texas at Dallas (Latifur Khan)
Other Secure Geosocial Information Grid
University of Texas at Dallas (Latifur Khan, Murat Kantarcioglu) Dependability Issues
University of Texas at Dallas (Latifur Khan, Kevin Hamlen, Eric Wong)
Research and Integrated Demo (FY10 and Beyond) Administrative Assistant
Ms Jamie McDonald, The University of Texas at Dallas
Recent History
Assured Information Sharing AFOSR MURI (Tim Finin)UMBC/Purdue/UTD/UIUC/UTSA/UM
Information Operations through Infospheres:Assured Information SharingAFOSR 2005-2008
Semantic FrameworkWith BlackbookIARPA
Secure Information Grid(Congressional Funds)
Policy basedSemantic WebNSF
DependableInformation SharingIARPA
Open Science GridDOE/NSF(Planned)
Accomplishments
UTDallasUTArlington
Purdue
Layered Architecture
Secure DATA/INFORMATION/KNOWLEDGE
GRID UTD/Purdue
GRID APPLICATIOINS: Border security UTA/UTD
GRID APPLICATIONS: NCES (UTD/Purdue)
SECURE GRID MODELS: access control and accountability Purdue/UTD
Secure INFRASTRUCTURE/MOBILE GRID UTA/UTD
Integration
UTD Lead with
others having a
role
CORE:
Policies
SSE -SOA
Security
Infrastructure
Assured
Information
Management Services
Assured
Federation
Management
Service
Assured
Incentive
Management
Service
ApplicationServices
S-SOAG
SecurityServices
InformationManagementServices
StorageServices
Assured
Information
Management Services
Infra-structureServices
ApplicationServices(e.g., NCW, Border Security)
CORE:
Policies
SSE -SOA
Security
Infrastructure
Assured
Information
Management Services
Assured
Federation
Management
Service
Assured
Incentive
Management
Service
ApplicationServices
S-SOAG
SecurityServices
InformationManagementServices
StorageServices
Assured
Information
Management Services
Infra-structureServices
ApplicationServices(e.g., NCW, Border Security)
Secure Semantic Framework
Entity Extraction Entity Extraction, Relationship Extraction
BLACKBOOK
OtherServices
e.g., SecurityIntegrity
RDF Graph Store RDF Graph Store
RDF Graph Store Management
Storage, Query, Integration
Ontology based Heuristic ReasoningRule based reasoning, Data mining
RDF Graph Store
DocumentsDocuments
Objectives/Environment• Objectives
– To build efficient storage using Hadoop for Peta-bytes of data
– To build an efficient secure query mechanism
– Possible outcomes
• Open Source Framework for RDF
• Integration with Jena
• Environment
– 4 node cluster in Semantic Web Lab
– 10 node cluster in Cloud Computing Lab
• 4 GB main memory
• Intel Pentium IV 3.0 GHz processor
• 640 GB hard drive
– OpenCirrus HP labs test bed
– Collaboration with Andy Seaborne, HP Labs
Preprocessing Steps
Some Query Results
Horizontal axis: Number of TriplesVertical axis: Time in milliseconds
Our Contributions• Storage
– Performance of cipher modes are analyzed under different granularity & disk access patterns
– CTR based page level encryption method is proposed
• Query– Performances are compared under different query types
– We propose the vertical partitioning approach to prevent unnecessary cryptographic operations over non-sensitive attributes
– First we focus on single table partitioning; Later on we generalize this problem to cover the entire schema and have proposed a heuristic to prevent exhaustive search space
14
Threat Models
Hard DiskHard Disk
Client ApplicationClient Application
Query Results
Authentication and Query Transformation
Authentication and Query Transformation
Transformed Query Query Results
Disk Access
Trusted Components
Query EngineQuery Engine
Untrusted Component
Plain Query
15
16
Counter (CTR4)
Decrypting 1 GB data
CTR4 is faster than CBC, CTR, OFB, CFB
17
Block Size (byte) CBC CTR OFB CFB CTR 4
16 24.9 26.1 27.1 25.6 23.0
64 24.3 25.7 26.5 25.0 22.8
128 23.7 25.6 26.2 25.2 22.3
256 23.5 25.5 26.3 25.4 22.1
512 23.4 25.5 26.4 25.4 22.1
1024 23.4 25.5 26.4 25.3 22.0
2048 23.3 25.5 26.5 25.4 22.0
4096 23.3 25.5 26.3 25.3 22.2
8192 23.2 25.5 26.4 25.4 22.3
Accountability Mechanisms for Grid Systems
Dr. Elisa BertinoResearch DirectorCERIAS
Computer Science Department Purdue University
Dr. Elisa BertinoResearch DirectorCERIAS
Computer Science Department Purdue University
ContributionsWhat is accountability?
Accountability is defined as “A is accountable to B when A is obliged to inform B about A’s past or future actions and decisions, to justify them, and to suffer punishment in the case of eventual misconduct”Accountability is an important aspect of any computer system for assuring that every action executed in the system can be traced back to some entityThe dynamic and multi-organizational nature of grid systems requires effective and efficient accountability system
ContributionsWe have developed a distributed mechanism to capture provenance information available during the distributed execution of jobs in a gridOur approach is based on the notion of accountability agentsWe have developed a simple yet effective language to specify the accountability data to collectWe have implemented a prototype of the accountability system on an emulated grid testbed
Overall Architecture of Accountable Grid SystemsOverall Architecture of Accountable Grid Systems
Two approaches CombinedJob-flow based approach
Jobs flow across different organizational units
Long computations are often divided into many sub-jobs to be run in parallel
A possible approach is to employ point-to point agents which collect data at each node that the job traverses
Grid node based approach
It focuses on a given location in the flow and at a given instant of time for all jobs
Viewpoint is fixed
The combination of two approaches allows us to collect complementary information
Accountability PoliciesExample A job is submitted to Purdue University SP and then assigned for execution to the RPs, A-state University, and B-state University. Purdue agrees to send job relation data (handle, job-id, subjob-id, RP-id, timestamp) to A-state and B-state when the processed job enters into active state. Additionally, A-state locally collects resource data (memory consumption, cpu time, network bandwidth, disk bandwidth) every day during the week.
The policies for such scenario are as follows:[at Purdue University] shared_policyPurdue := send_job_data (agent@Purdue, agents_in_job_relationPurdue, active, dataSetactive, job-id)collect_job_data (agent@Purdue, active, dataSetactive, DBPurdue)agents_in_job_relationPurdue := agent@A-state (AND) agent@B-statedataSetactive := handle (AND) job-id (AND) subjob-id (AND) RP-id (AND) timestamp[at A-state University] local_policyA-state := collect_resource_data (agent@A-state, dataSetlocal, time_constraintsA-state, DBA-state)dataSetlocal := memory consumption (AND) cpu time (AND) network bandwidth (AND) disk bandwidthtime_constraintsA-state := weekdays (AND) all.days
Exp. 1 / Scalability with respect to the number of computing nodes
The response time is computed as the difference between the time at which the user receives the result and the time at which a user submits the job
Blue bars show the overhead introduced by accountability, which is negligible
2.83 1.33 4 2.3 2.5 2.8
0
50
100
150
200
250
300
Tim
e (
seconds)
4 10 20 40 80 100
Number of Nodes
Response Time
with accountability
without accountability
overhead
Experimental Evaluations
Job Submission
SP
HN
CN
RP RP
HN
CN CN
A Framework for Pervasively Secure A Framework for Pervasively Secure Grid InfrastructureGrid Infrastructure
Sajal K. Das, Director
Center for Research in Wireless Mobility and Networking (CReWMaN)
Department of Computer Science and EngineeringThe University of Texas at Arlington
http://crewman.uta.edu
Mobile / Pervasive Grid: A New ParadigmMobile / Pervasive Grid: A New Paradigm
• Next-generation information / knowledge Grid.
• Huge resource pool of laptops, mobile devices, and wireless sensors.
• A pervasive computing infrastructure of smart devices connected to the Grid across heterogeneous wireless networks and service providers.
• Context awareness (e.g., activity, user / device / node mobility) is the key.
• Applications: e-Learning, e-Health, banking, power grid, security, border control, disaster / crisis management, emergency response and rescue, …
Computational Grid
Data Grid
Grid CommunityWirelessAccessPoints
• Objectives– Dynamic resource management in (wireless) pervasive grids to handle multi-
mission, often conflicting tasks– Development of multi-level security framework for high assurance information
sharing in pervasive grids– Context / situation-aware data collection, aggregation (fusion), and mining from
heterogeneous sensors, surveillance, monitoring, and tracking devices– Learning patterns via information fusion leading to anomaly detection, hence
potential security threats– Intelligent decision making in an integrated, adaptive, autonomous and
scalable manner for high information assurance, safety and security• Security Challenges
– Limited resource in wireless mobile devices and sensors Limited defense capability
– Uncertain, often unattended or hostile environment– Node compromises (insider attacks) revealed secrets– Post-deployment access and control is an issue– Lack of centralized control Potential loss of integrity and confidentiality due
to information fusion– Multiple-attacking angles Single level defense mechanism highly vulnerable
Research Objectives and ChallengesResearch Objectives and Challenges
Research ContributionsResearch Contributions
Grid Community
UsersunderWAP1
UsersunderWAPp
WAP1
WAPp
Task Allocation
Task Allocation
Resource assignment
Resource
Novel game-theoretic framework for pervasive grid infrastructure that tracks node mobility and optimizes resource usage (e.g., wireless bandwidth, response time) for single and multi-class tasks
WAP: Wireless Access PointPool of (mobile) Users
• Mathematical models and framework for multi-level security in pervasive grids (wireless sensor networks)
- distributed key management (among a cluster of nodes)
- secure information gathering, fusion, and routing
- modeling smart adversaries
- detection of compromised and replicated nodes
- controlling propagation of internal attacks
Compromised NodeReport false data
Infect other nodes
False routing info
Forge command
Selective packet dump
Discredit normal nodes
Research Contributions (cont’d)Research Contributions (cont’d)
Experimental ResultsExperimental Results
High Bandwidth grid systems Medium Bandwidth grid systems
Low Bandwidth grid systems
PRIMOB: Pricing based Mobile algorithm (game)
OPTIM : Optimal algorithm COOP: Cooperative algorithm
Compromise Process Modeling
Revoke Revealed Secrets
Detect Compromise
Self-Correct Tampered Data
Contain Outbreak
Purge Tampered Data
Epidemic Theory
Information Theory
Statistical Learning and Classification
Digital Watermarking
Trust / Reputation Model
Uncertainty CharacterizedResource Limited Pervasive Grid Environment
Architectural Components
Key Management
Secure Aggregation
Secure Routing
Highly Assured GridNetwork Operation
Node Compromise
Theoretical Foundations
DoS Defense
Topology Control
Multi-Level Security FrameworkMulti-Level Security Framework
Intrusion Detection
Trust Model: ReputationTrust Model: Reputation Results
Case No.
Misbehaving time (%)
False data type
1 0 N/A
2 100 Obvious
3 100 Tricky
4 66 Obvious
5 66 Tricky
No malicious nodes all nodes’ reputation close to 1
Reputation of malicious nodes is significantly lower than legitimate ones
Reputation of malicious nodes is proportional to amount of true data they send
Attack Scenarios
Secure, Highly Available, and High Performance Peer-to-Peer
Cloud Storage Systems
Dr. I-Ling Yen
The University of Texas at Dallas
What is the good storage design for cloud environment?
• The role of the storage system in assured information sharing– Basis for availability, security, access performance assurance
• If the storage system is not available or does not offer good access performance, then the upper layer data applications cannot have high availability or good performance
• If some nodes in the storage system are compromised and the system cannot tolerate it, then the secure data are compromised
• Various storage system designs– Cluster based systems versus widely distributed cloud
environment– Replication and encryption based strategies and secret sharing
based strategies• Secret sharing, erasure coding, short secret sharing (SSS)
– Directory management
Good storage design for cloud?
0.972
0.976
0.98
0.984
0.988
0.992
0.996
1
3 5 7 9 11 t
Se
cu
rity
RWE_WD
RWE_CB
SSS_WD
SSS_CB
0
1000
2000
3000
4000
5000
6000
0 1 2 3 x
Rea
d A
cces
s C
ost
RWE_WD
RWE_CB
TSSLE_WD
TSSLE_CB
SSS_WD
SSS_CB
0.6
0.7
0.8
0.9
1
-4 -3 -2 -1 x
Av
ail
ab
ilit
y RWE_WD
RWE_CB
SSS_WD
SSS_CB
m=40, t=5, pnf=0.01, variant x=log10(pef). (=0.01, σ=2*).
t x=log10(Data Size), m=40, t=5
Widely distributed solutions perform better In security and access costs: better In availability: almost the same
Good storage design for cloud?
0.972
0.976
0.98
0.984
0.988
0.992
0.996
1
3 5 7 9 11 t
Se
cu
rity
RWE_WD
RWE_CB
SSS_WD
SSS_CB
0
1000
2000
3000
4000
5000
6000
0 1 2 3 x
Rea
d A
cces
s C
ost
RWE_WD
RWE_CB
TSSLE_WD
TSSLE_CB
SSS_WD
SSS_CB
0.6
0.7
0.8
0.9
1
-4 -3 -2 -1 x
Av
ail
ab
ilit
y RWE_WD
RWE_CB
SSS_WD
SSS_CB
m=40, t=5, pnf=0.01, variant x=log10(pef). (=0.01, σ=2*).
t x=log10(Data Size), m=40, t=5
SSS based storage performs better In security: same as secret sharing better than replication In availability: almost the same for all schemes In access cost: better than secret sharing mostly better than replication In storage cost: SSS and replication are similar
and better than secret sharing
Major design issues
• Use widely distributed infrastructure and SSS• Directory management
– Specifically maintain directories• Costly in widely distributed systems• Needs additional directory access cost
– Use P2P solution: Design DHT for SSS
• Access protocols– In SSS, the consistency issue is much more important than that
in conventional storage systems• If a share is inconsistent, the reconstructed data is fully incorrect
– In SSS, the access protocol design is much more important than that in conventional storage systems
• One may keep on getting inconsistent shares• If only one share is maintained at each servers, then there may be
data losses since these shares may not be consistent
– Need to: Design efficient access protocols
DHT for SSS• DHT for SSS
– Hash algorithm for SSS• d.si.identifier = (d.identifier + i*2m / n) % 2m
– d: data object– d.si: the ith share of d– 2m: the size of the identifier space for DHT
– Adopt One-Hop-Lookup • Reduced routing time
– Method to support accesses of shares from near-by servers• Each server stores the geographical locations and IPs of all other
servers– Use geographical distance to approximate the real latency
when no other information is available– Conducted experimental studies to understand the relationship
between geographical distance and ping latency and the potential error rate
Efficient Access Protocols• Use version number to validate share consistency
– When updating, generate and attach a version number to shares– During retrieval, validate the consistency of shares– Problem: how to efficiently generate version numbers
• Fully decentralized approach: the new updates may not have the largest version number
• Centralized version server: bottleneck problem, single failure point– Solution: distributed version servers: For each data, use the
hash value of its first share to determine the version server• Very efficient, impact of failure is localized
• Maintain share history to avoid data losses– Problem
• E.g., (10, 5) sharing scheme, Current version number = 10• Client x updated 3 shares and failed (version no = 11)• Client y updated 4 shares and failed (version no = 12)• 3 shares with vn = 10, 3 shares with vn = 11, 4 shares with vn = 12
– Solution: Maintain multiple versions• Need to properly determine which share to retrieve • Need to have a share removal protocol
Compare with other access methodsRead latency (versus k) Update latency (versus k)
Get version number from all servers
Our approach
Fully decentralized appraoch
Space cost
DirectionsIntegrated demonstration
E.g., Accountability into Hadoop PrototypeHost prototype on secure infrastructureIntegrate with Blackbook
Feed results into other projectsAFOSR-MURI, IARPA
Participate in DOE/NSF Open Science GridTransfer technology to DoD/NCES