sep-26-03.ppt
Transcript of sep-26-03.ppt
New England Database Society (NEDS)
Friday, September 26, 2003
Volen 101, Brandeis University
Sponsored by Sun Microsystems
Data and Applications Security Developments and Directions
andXML Security
Bhavani Thuraisingham
The National Science Foundation
September 2003
Outline
Data and Applications Security (DAS)
- Developments and Directions; DAS at NSF
Secure Semantic Web
- XML Security; Other directions
Some Emerging Secure DAS Technologies
- Secure Information Integration; Secure Sensor Information Management; Secure Dependable Information Management
Some Directions for Privacy Research
- Data Mining for handling security problems; Privacy vs. National Security; Privacy Constraint Processing; Foundations of the Privacy Problem
What are the Challenges? Details of XML Security Research
Developments in Data and Applications Security: 1975 - Present
Access Control for Systems R and Ingres (mid 1970s) Multilevel secure database systems (1980 – present)
- Relational database systems: research prototypes and products; Distributed database systems: research prototypes and some operational systems; Object data systems; Inference problem and deductive database system; Transactions
Recent developments in Secure Data Management (1996 – Present)
- Secure data warehousing, Role-based access control (RBAC); E-commerce; XML security and Secure Semantic Web; Data mining for intrusion detection and national security; Privacy; Dependable data management; Secure knowledge management and collaboration
Developments in Data and Applications Security: Multilevel Secure Databases - I
Air Force Summer Study in 1982 Early systems based on Integrity Lock approach Systems in the mid to late 1980s, early 90s
- E.g., Seaview by SRI, Lock Data Views by Honeywell, ASD and ASD Views by TRW
- Prototypes and commercial products
- Trusted Database Interpretation and Evaluation of Commercial Products
Secure Distributed Databases (late 80s to mid 90s)
- Architectures; Algorithms and Prototype for distributed query processing; Simulation of distributed transaction management and concurrency control algorithms; Secure federated data management
Developments in Data and Applications Security: Multilevel Secure Databases - II
Inference Problem (mid 80s to mid 90s)
- Unsolvability of the inference problem; Security constraint processing during query, update and database design operations; Semantic models and conceptual structures
Secure Object Databases and Systems (late 80s to mid 90s)
- Secure object models; Distributed object systems security; Object modeling for designing secure applications; Secure multimedia data management
Secure Transactions (1990s)
- Single Level/ Multilevel Transactions; Secure recovery and commit protocols
Some Directions and Challenges for Data and Applications Security - I
Secure semantic web
- Single/multiple security models?
- Different application domains Secure Information Integration
- How do you securely integrate numerous and heterogeneous data sources on the web and otherwise
Secure Sensor Information Management
- Fusing and managing data/information from distributed and autonomous sensors
Secure Dependable Information Management
- Integrating Security, Real-time Processing and Fault Tolerance Data Sharing vs. Privacy
- Federated database architectures?
Some Directions and Challenges for Data and Applications Security - II
Data mining and knowledge discovery for intrusion detection
- Need realistic models; real-time data mining Secure knowledge management
- Protect the assets and intellectual rights of an organization Information assurance, Infrastructure protection, Access
Control
- Insider cyber-threat analysis, Protecting national databases, Role-based access control for emerging applications
Security for emerging applications
- Geospatial, Biomedical, E-Commerce, etc. Other Directions
- Trust and Economics, Trust Management/Negotiation, Secure Peer-to-peer computing,
NSF Efforts in Data and Applications Security (DAS)
Security for IIS (Information and Intelligent Systems) Technologies
- DAS focuses on security needs for IIS Division Technologies (e.g. Information and data management, digital libraries, collaboration and e-business, etc.)
- DAS related proposals have also been managed under ITR (Information Technology Research) and other initiatives (e.g., Sensor initiative) during FY2003
DAS is part of CISE-wide (Computer and Information Sciences) Directorate Theme on Cyber Trust for FY04 and beyond
- Focus areas for Cyber Trust include: Trusted Computing, Network Security, Data and Applications Security, Embedded Systems Security
- Inaugural Cyber Trust PI Meeting in Baltimore, August 13-15, 2003
- Plans for FY2004 will be announced soon
Opportunities possibly also under ITR
Directions and Challenges for Securing the Semantic Web
The Semantic Web by Tim Berners Lee
- Definition and Layers Steps for Securing the Semantic Web XML Security for Securing the Semantic Web Related research and directions for secure semantic web
- Secure Information Integration
Secure Semantic Web
According to Tim Berners Lee, The Semantic Web supports
- Machine readable and understandable web pages Layers for the semantic web: Security cuts across all layers Challenge: Not only integrating the layers for the semantic web, but
also ensuring secure interoperability
Layer 1
Layer 2
Layer 3
Layer 4
Layer 5
TCP/IP, Sockets, HTML, Agents
XML, XML Schemas
RDF
Ontologies, Semantic Interoperability
Logic, Proof, Trust
Steps to Securing the Semantic Web Flexible Security Policy
- One that can adapt to changing situations and requirements
Security Model
- Access Control, Role-based security, Nonrepudiation, Authentication
Security Architecture and Design
- Examine architectures for semantic web and identify security critical components
Securing the Layers of the Semantic Web
- Secure agents, XML security, RDF security, secure semantic interoperabiolity, security properties for ontologies, Security issues for digital rights
Challenge: How do you integrate across the layers of the Semantic Web and preserve security?
Much of the research is focusing on XML security; Next step is securing RDF documents
XML Security
Some ideas have evolved from research in secure multimedia/object data management
Access control and authorization models
- Protecting entire documents, parts of documents, propagations of access control privileges; Protecting DTDs vs Document instances; Secure XML Schemas
Update Policies and Dissemination Policies Secure publishing of XML documents
- How do you minimize trust for third party publication Use of Encryption Inference problem for XML documents
- Portions of documents taken together could be sensitive, individually not sensitive
More details at the end
What are the Next Steps and Challenges for Secure Semantic Web? - I
We need to continue with XML security research as well as work with standards
- W3C standards are advancing rapidly; security research, prototypes and products must keep up with the developments
- Researchers, vendors and standards organizations must work together
Secure XML Database Systems (query, transactions, storage, - - -) RDF Security
- When you bring in semantics, many challenges for security
- Need to develop security models for RDF documents Secure Ontologies
- Two aspects; one is to develop protection models for Ontology databases; other is to use ontologies for ensuring security and privacy
What are the Next Steps and Challenges for Secure Semantic Web? - II
Secure semantic interoperability
- What can we learn from secure database interoperability and federated databases?
Trust and digital rights management
- How do you trust the contents of a document? How do you pass digital rights when documents are disseminated?
Security for domain specific semantic webs
- Do we need multiple security policies and models? Secure interoperability across the layers of the semantic web
- This will be a major challenge even when security is not being considered
- Security has to be considered in the beginning Secure Information Integration is a key component of securing the
semantic web
Secure Information Integration Integrate disparate, heterogeneous and autonomous information
sources on the web or otherwise
- E.g, structured/unstructured data, data streams, geospatial data Security must be considered together with the Information
Integration technologies IJCAI workshop on Information Integration
http://www.isi.edu/info-agents/workshops/ijcai03/iiweb.html
- Technologies include Information extraction and gathering; Wrapper learning and automatic wrapper generation; Source descriptions, source meta-data learning and source statistics learning; Web service composition; Record linkage/object consolidation and Ontology matching; Novel integration and Inter-schema mediation architectures; Answering queries using views; Web-based query planning, optimization and execution; Data mining for integration
Secure Information Integration: Directions for Research
Start research on security technologies for information integration
- E.g., Secure web services decomposition; Security architectures for integration; Security issues for ontology matching, Secure information extraction, etc.
Secure sensor information management is one aspect of secure information integration
- Data streams from disparate, autonomous and heterogeneous sensors have to be fused and managed securely
Secure Sensor Information Management Sensor network consists of a collection of autonomous and
interconnected sensors that continuously sense and store information about some local phenomena
- May be employed in battle fields, seismic zones, pavements Data streams emanate from sensors; for geospatial applications
these data streams could contain continuous data of maps, images, etc. Data has to be fused and aggregated
Continuous queries are posed, responses analyzed possibly in real-time, some streams discarded while rest may be stored
Recent developments in sensor information management include sensor database systems, sensor data mining, distributed data management, layered architectures for sensor nets, storage methods, data fusion and aggregation
Secure sensor data/information management has received very little attention; need a research agenda
Secure Sensor Information Management: Directions for Research
Individual sensors may be compromised and attacked; need techniques for detecting, managing and recovering from such attacks
Aggregated sensor data may be sensitive; need secure storage sites for aggregated data; variation of the inference and aggregation problem?
Security has to be incorporated into sensor database management
- Policies, models, architectures, queries, etc. Evaluate costs for incorporating security especially when the sensor
data has to be fused, aggregated and perhaps mined in real-time Research on secure dependable information management for sensor
data
Secure Dependable Information Management Dependable information management includes
- secure information management
- fault tolerant information
- High integrity and high assurance computing
- Real-time computing Conflicts between different features
- Security, Integrity, Fault Tolerance, Real-time Processing
- E.g., A process may miss real-time deadlines when access control checks are made
- Trade-offs between real-time processing and security
- Need flexible security policies; real-time processing may be critical during a mission while security may be critical during non-operational times
Secure Dependable Information Management Example: Next Generation AWACS
Technology
provided by
the project
Technology
provided by
the project
Hardware
Display Processor
&Refresh
Channels
Consoles(14)
Navigation
Sensors
Data LinksData Analysis Programming
Group (DAPG)
FutureApp
FutureApp
FutureApp
Multi-SensorTracks
SensorDetections
MSIApp
DataMgmt. Data
Xchg.
Infrastructure Services
•Security being considered after the system has been designed and prototypes implemented
•Challenge: Integrating real-time processing, security and fault tolerance
Real-time Operating System
Secure Dependable Information Management: Directions for Research
Challenge: How does a system ensure integrity, security, fault tolerant processing, and still meet timing constraints?
- Develop flexible security policies; when is it more important to ensure real-time processing and ensure security?
- Security models and architectures for the policies; Examine real-time algorithms – e.g.,query and transaction processing
- Research for databases as well as for applications; what assumptions do we need to make about operating systems, networks and middleware?
Data may be emanating from sensors and other devices at multiple locations
- Data may pertain to individuals (e.g. video information, images, surveillance information, etc.)
- Data may be mined to extract useful information
- Need to maintain privacy
Research Directions for Privacy
Why this interest now on privacy?
- Data Mining for National Security
- Data Mining is a threat to privacy
- Balance between data sharing/mining and privacy Is federated data management a solution
Privacy Preserving Data Mining Inference Problem as a Privacy Problem
- Handling privacy constraints; Foundations Web/Semantic Web will have to address privacy Federated Architectures for Data Sharing?
Data Mining to Handle Security Problems
Data mining tools could be used to examine audit data and flag abnormal behavior
Much recent work in Intrusion detection
- e.g., Neural networks to detect abnormal patterns Tools are being examined to determine abnormal patterns for
national security
- Classification techniques, Link analysis Fraud detection
- Credit cards, calling cards, identity theft etc.
Data Mining as a Threat to Privacy
Data mining gives us “facts” that are not obvious to human analysts of the data
Enables inspection and analysis of huge amounts of data Possible threats:
- Predict information about classified work from correlation with unclassified work
- Mining “Open Source” data to determine predictive events (e.g., Pizza deliveries to the Pentagon)
It isn’t the data we want to protect, but correlations among data items
Initial ideas presented at the IFIP 11.3 Database Security Conference, July 1996 in Como, Italy
Data Sharing/Mining vs. Privacy: Federated Data Management Architecture for the Department of Homeland Security?
What can we do?: Privacy Preserving Data Mining
Prevent useful results from mining
- limit data access to ensure low confidence and support
- Extra data (“cover stories”) to give “false” results with Providing only samples of data can lower confidence in mining results;
Idea: If adversary is unable to learn a good classifier from the data, then adversary will be unable to learn good
- rules, predictive functions Approach: Only make a sample of data available
- Limits ability to learn good classifier Several recent research efforts have been reported
Privacy Problem as a form of theInference Problem
Privacy constraints
- Content-based constraints; association-based constraints Privacy controller
- Augment a database system with a privacy controller for constraint processing and examine the releasability of data/information (e.g., release constraints)
Use of conceptual structures to design applications with privacy in mind (e.g., privacy preserving database and application design)
The web makes the problem much more challenging than the inference problem we examined in the 1990s!
Is the General Privacy Problem Unsolvable?
Privacy Constraints
Simple Constraints - an attribute of a document is private Content-based constraints: If document contains information about
XXX, then it is private Association-based Constraints: Two or more documents together is
private; individually they are public Dynamic constraints: After some event, the document is private or
becomes public Several challenges: Specification and consistency of constraints is a
Challenge; How do you take into consideration external knowledge? Managing history information
Architecture for Privacy Constraint Processing
User Interface Manager
ConstraintManager
Privacy Constraints
Query Processor:
Constraints during query and release operations
Update Processor:
Constraints during update operation
Database Design Tool
Constraints during database design operation
DatabaseDBMS
Foundations of the Privacy Problem Privacy Problem: Given a database and a set of privacy constraints,
can you decide ahead of time that privacy will be violated; that is, through querying can one extract information that is private?
Is the General Privacy problem unsolvable
- Yes.
- To what extent? Research result: For every recursively enumerable degree
one can find a privacy problem that is one-one equivalent to the degree (paper in preparation)
What is the Computational Complexity of the Privacy Problem?
- Can one develop varying degrees of privacy classes? What is the space-time complexity?
Privacy for the Web/Semantic Web
Privacy for the web is getting a lot of attention; especially after the publicity with the DARPA program (total information awareness - TIA)
We need to start looking at privacy for the semantic web also; that is what are the additional privacy concerns due to the semantic web?
Is privacy a technical problem? What roles do lawyers, policy makers and sociologists have to play?
- How can scientists and technologists, lawyers, policy makers and sociologists work together?
Should we limit privacy research within the context of national security or extend it beyond –e.g., medical community, banking, IRS
We must follow up with recent IBM workshop on Privacy; Discussions at NSF involving multiple programs
Secure Federated Database Management for Data Sharing: Schema Integration
Adapted from Sheth and Larson, ACM Computing Surveys, September 1990
Component Schema for Component A
Component Schema for Component B
Component Schema for Component C
Local Schema 1
Local Schema 2
Generic Schema for Component A
Generic Schemafor Component B
Generic Schemafor Component C
Export Schemafor Component A
Export Schema Ifor Component B
Export Schemafor Component C
Federated Schemafor FDS - 1
Federated Schemafor FDS - 2
ExternalSchema 1.2 Schema 2.1
ExternalSchema 2.2
ExternalSchema 1.1
Export Schema IIfor Component B
External
Secure Federated Database Management for Data Sharing: Policy Integration
Policies at the Componentlevel: e.g., Component policiesfor components A, B, and C
Generic policies for the components:e.g., generic policies for components A, B, and C
External policies: Policiesfor the various classes of users
Layer 1
Layer 2
Layer 3
Layer 4
Layer 5
Federated policies: integrate export policies of the components of the federation
Export policies for the components:e.g., export policies for components A, B, and C(note: component may export different policiesto different federations)
Adapted from Computers and Security, Thuraisingham, December 1994
What are our challenges?
If semantic web is to become viable, we need to understand how the different layers may interoperate; we cannot ignore security and privacy
Data Mining, National Security and Privacy will dominate research because of the times we are living in
We don’t have a good handle on secure dependable data/information management
- How do we handle conflicting requirements? e.g., integrating security, real-time processing, and fault tolerant computing
- Building dependable semantic webs?
Secure sensor nets, Secure e-commerce systems, Secure knowledge management will continue to have many challenging research problems
We need to build systems based on solid theoretical foundations; composable systems (ensure interfaces are secure)
Interdisciplinary research is the way of the future; within CS as well as between CS and other areas (e.g., secure sensors)
Some Key Directions Transfer security technology to operational systems; need to
develop systems that are flexible, usable and secure
- Bring human computer interaction and people aspects into system design
Security for emerging applications
- E.g., medical informatics, bioinformatics, scientific and engineering informatics, and other areas
Data mining for security (e.g., intrusion detection, insider cyber threat); cannot forget about Privacy
Interdisciplinary research in information security Emerging areas include Secure semantic web, Secure Information
Integration, Secure Sensors, Trust Management/Negotiation, Economics, - - - - -
Other Ideas and Directions?
Please contact
- Dr. Bhavani Thuraisingham The National Science Foundation Suite 1115 4201 Wilson Blvd Arlington, VA 22230 Phone: 703-292-8930 Fax 703-292-9037 email: [email protected]
XML Security
Collaborating with University of Milan; Paper to appear in TKDE Access Control
- Pull model: User queries XML documents; results are computed by applying the access control rules in the policy base and user credentials
- Push model: Periodically portions of XML documents are pushed to the user depending on the credentials and access control rules
Secure publishing of XML documents
- With a set of digital signatures generated by the owner and no trust required of the publisher, a subject can verify the authenticity of the query response
Example XML Document
Patents
Funds
Year: 2002
Name: U. Of X
ExpensesName:CS
titleAuthorID
Asset report
Assets
Dept
Equipment
news
Patent
Other assets
Grants
Contracts
Subject Credentials and Protection Objects
Subjects are given access to XML documents or portions of documents depending on user ID and/or Credentials
Credential specification is based on credential types; credential type is a pair <credential name, credential properties>
- Example of credential types for the XML document are: Professor, Secretary (depending on the roles)
Protection objects are objects to which access is controlled
- Entire XML documents or portions of XML documents
- Protection objects is a pair <target, path>
- Target is the file name of the XML document
- Path is Xpath expression on target
Credential Base
<Professor credID=“9” subID = “16: CIssuer = “2”><name> Alice Brown </name><university> University of X <university/><department> CS </department><research-group> Security </research-group>
</Professor>
<Secretary credID=“12” subID = “4: CIssuer = “2”><name> John James </name><university> University of X <university/><department> CS </department><level> Senior </level>
</Secretary>
Policy Base
Policy base stores security policies for protecting the XML source contents
Policy base is an XML document with a subelement policyspec for each security policy defined for XML source
Policyspec has the following
- Subject consisting of userID and/or credentials
- Object (with target and path)
- Access modes: Read, Navigate, Append, Write
- Propagation option: No propagation, One level, Cascade Security officer manages the policy base
Policy Base Example
<? Xml VERSION = “1.0” ENCODING = “utf-8”?> <Policy–base>
<policy-spec cred-expr = “//Professor[department = ‘CS’]” target = “annual_ report.xml” path = “//Patent[@Dept = ‘CS’]//Node()” priv = “VIEW”/>
<policy-spec cred-expr = “//Professor[department = ‘CS’]” target = “annual_ report.xml” path = “//Patent[@Dept = ‘EE’] /Short-descr/Node() and //Patent [@Dept = ‘EE’]/authors” priv = “VIEW”/>
<policy-spec cred-expr = - - - -
<policy-spec cred-expr = - - --
</Policy-base>
Explantaion: CS professors are entitled to access all the patents of their department. They are entitled to see only the short descriptions and authors of patents of the EE department
Access Control Strategy
Subjects request access to XML documents under two modes: Browsing and authoring
- With browsing access subject can read/navigate documents
- Authoring access is needed to modify, delete, append documents
Access control module checks the policy based and applies policy specs
Views of the document are created based on credentials and policy specs
In case of conflict, least access privilege rule is enforced
System Architecture for Access Control
UserPull/Query Push/result
XML Documents
X-Access X-AdminAdmin Tools
Policybase
Credentialbase
Secure Publishing of XML Documents
Distinguish between owner, publisher and user (subject) Owner specifies access control policies based on user
credentials; policy specified in policybase Publisher computes view of document and sends reply
document to subject; no trust placed on the publisher by using signatures
Owner
Publisher
SubjectSubscribe
Policy
Security Enhanced DocumentSecure Structure
Query, Policy
Reply document, Secure structure
Subject Owner Interaction
Subjects register with Owner during subscription phase; during this phase subject is assigned by owner credentials stored at the owner site
Owner returns to the subject the Subject Policy Configuration (policy identifiers) that apply to the subject signed with the private key of the owner
Example: If polices P1 and P2 apply to John and policy P6 apply to Jane, owner Joe sends John P1 and P2 and to Jane P6 signed with Joe’s private key
Owner Publisher Interaction For each document the owner sends the publisher information on which subjects can
access which portions of the document according to the policy base (I.e. access control
policies)
- Also for each element e based on the policies applied to e, the owner inserts policy
configuration (binary string) converted to hexadecimal representation; this element
is called Policy configuration attribute (PCattribute)
- Policy element which describes the policies for the document is also inserted
Owner also sends publisher Merkle Signature of each document
- It is the Merkle hash signed by owner’s private key
The document together with the security information is called “Security Enhanced
Document” (SE-XML)
Information in the security enhanced document enables the subject to verify the
authenticity of the document returned by publisher
Additional information encoded in the document called Secure Structure is used by the
subject to verify completeness of the result (for certain queries)
Subject Publisher Interaction
The subject submits queries to publisher; it also sends its subject policy configuration
Publisher computes a view of the requested documents based on access control policies for the subject set by the owner
To verify the authenticity of the answer, subject must recompute the same bottom up hash value signed by owner (i.e. Merkle signature) and compare it with the Merkle signature generated by the owner and inserted by the publisher
Subject may not get the entire document; therefore publisher sends to the subject additional hash values that refer to the missing portions of the document
- Hash value of parent is computed from hash values of children as well as hash values of tag names/values; publisher sends enough information for subject to compute hash value of the document
Subject verifies the authenticity of the document
MhX(Author)=h(h(Author)||h(Author.value))MhX(title)=h(h(title)||h(title.value))
titletitleAuthor
Author
paragraph
Politic_page Literary_page
Paragraphs
title
date
titleAuthor
titleAuthortopic
titleAuthortopictitleAuthortopictitleAuthortopic
Article
Newspaper
Frontpage
Leading
Sport_page
news news
Politic
paragraph
MhX(paragraph)=h(h(paragraph)||h(paragraph.content)||MhX(Author)||MhX(title))
Merkle Signature: Example
Some Results Theorem 1: Let g = (Vg, vg, Eg, FEg) be the SE-XML version of
an XML document d and r = (Vr, vr, Er, FEr) be the reply document corresponding to a query submitted on d by subject s. Each node in Vr,e is authenticable by s where a document d = (Vd, vd, Ed,
Fed) is defined as follows: Vd is the set of all element nodes and attribute nodes in d, vd is the node representing the document element called the document root, Ed is the set of edges in d, and FEd is the edge labeling function, Vr,e is the set of element nodes in the reply document r
Subject Verification Algorithm: Input: Reply document r = (Vr, vr, Er, FEr) Output = True if all nodes in r are authentic. False otherwise
Theorem 2: Let s be a subject, q be a query submitted by s, and r be the reply document received by s as an answer to q. Subject verification algorithm returns True iff. Each v in Vr,e is authentic where Vr,e is the set of element nodes in the reply document r
Note on Completeness Owner sends structure of the XML document to publisher called secure
structure containing names of tags and attributes and not the data content
Publisher sends the secure structure together with reply document to subject
Subject locally executes on the structure all queries whose conditions are against the document structure of the original document; the results are compared with the reply document
Key points
- Secure structure of the document is generated by hashing each tag and attribute name; it has the hashed attribute values of the XML document
- Secure structure also has policy element and policy configuration attributes of elements (not hashed)
- Completeness for queries on structure and equality on attribute values
Challenge: Extensions for more general queries