Secure query processing over encrypted Big Data in public...
Transcript of Secure query processing over encrypted Big Data in public...
Secure query processing over encrypted Big Data in public cloud
Mohammad Ahmadianwww.cs.ucf.edu/~ahmadian
Ph.D candidate in Computer ScienceUniversity of Central Florida
February 2016Fayetteville State University
Is it possible to delegate processing of your data without getting your private information revealed?
Introduction Big Data Cloud computing Security SecureNoSQL Information Leakage Conclusion References
Outline
Feed people Supply energy Medical care
In efficient way
Introduction
progress progress
Storing Data in Disk 2000 BC
Big Data helps us to:
Why we have this much Data? Datafication Atomization
Big Data
Why traditional database systems fail to support “big data”? Faster response time Scalability
Properties of Big Data Variety Volume Velocity
Key-Value store A dictionary DS where a key uniquely identifies the value.
Column-family Data are stored in rows and each row has a unique key and set of columns
Document-store Data are stored in internal structure (Document) to offer higher level of
granularity. Each document has a unique key to identify.
Graph DatabaseThis model is based on graph and can used to represent
complex structures and highly connected data.
Big Data – Data models
Introduction Big Data Cloud computing Security SecureNoSQL Information Leakage Conclusion References
Outline
Compute resources as a utilitycustomers choosing to buy the computing resources they need from some central utility rather than generate it
Self-service provisioning End users can spin up computing resources for almost any type of workload on-demand.
Elasticity scale up as computing needs increase and then scale down
Pay per use Computing resources are measured at a granular level, allowing users to pay only for the resources and
workloads they use
Cloud computing
Cloud computing services:
Cloud computing
Shared by several organization;
Typically externally hosted but may be internally
hosted by one of the organizations
Community
Used for a single organization; can be internally or externally hosted
Private
Composition of the two or more clouds(private or public) that remain unique entities but are bound together, offering the benefits of multiple deployment models, is internally and externally hosted.
Hybrid Public
Provisioned for open use for the public by a particular organization who also hosts the service.
Introduction Big Data Cloud computing Security SecureNoSQL Information Leakage Conclusion References
Outline
Big Data- Big Security “2015 Global Megatrends in Cybersecurity”[1], the security and privacy threat is the most preventive
reason to avoid joining and use cloud services.
Legal issue Storing certain type of unencrypted data such as medical record off-site is illegal.
Encryption Data encryption nullify the benefits of cloud computing (sacrificing convenience) unless give the cloud
secret key to cloud for decryption (sacrificing privacy)
Modern crypto-system With traditional crypto-systems it is impossible to outsource encrypted data to cloud for processing.
Thus, we need new type of crypto-systems.
Security
Introduction Big Data Cloud computing Security SecureNoSQL Information Leakage Conclusion References
Outline
Unchanged DBMS and clients side encrypted documents in the DBMS with respect to standard semantics.
Security level-proportional overhead high level security more overhead.
Open-ended User can add new cryptographic modules to the system .
Multi-key, multi-level security Each data element can have different security mechanism.
Descriptive language Key-value pair for describing security parameters.
Data integrity Unauthorized modification by malicious.
SecureNoSQLObjectives:
Architecture
SecureNoSQL
Random(RND) encrypted same message with same key yields different ciphertext.
Deterministic (DET) high level security more overhead.
Order-preserving encryption (OPE) Order of of plaintext is projected on the ciphertext.
Additive homomorphic encryption (HOM) allow limited operations over encrypted data.
SecureNoSQL – Crypto-systems
JSON Schema Shows how the query data should be interpreted and how to extract and apply security mechanism for
data items.
Sections of schema Collection Cryptographic modules Data element Mapping cryptographic modules to the fields
SecureNoSQL – Security Schema
Collection A collection is a group of NoSQL documents equivalent to RDBMS table.
SecureNoSQL – Security Schema
Cryptographic modules the pointer to an item, the encryption key and initialization vector.
SecureNoSQL – Security Schema
Data elements Describes all data elements with JSON notation
SecureNoSQL – Security Schema
Mapping cryptographic modules to the Fields Assigns cryptographic modules to the Fields.
SecureNoSQL – Security Schema
Query Encryption Parsing query elements and applying cryptographic modules based on secure schema.
SecureNoSQL
Some sample NoSQL queries Parsing query elements and applying cryptographic modules based on secure schema.
SecureNoSQL
Data integrity Using HMAC client generates hash values for all encrypted documents.
SecureNoSQL
SecureNoSQL – Data-flow
Introduction Big Data Cloud computing Security SecureNoSQL Information Leakage Conclusion References
Outline
Information leakage from crypto-systems Weakness and strengths of crypo-systems. Poor parameter selection for crypto-system
short primes for RSA or bad key for AES Intrinsic weakness of crypt-system OPE, DES, RC2, MD5
Information leakage from statistical sampling Zipf, Gaussian, Power law
Information leakage from access pattern given query q and dataset D, server easily finds outs the set of documents that touched.
Information leakage
Solution 1 Adding fake query for valid query
Information leakage
Solution 2 Hiding statistical model of encrypted data.
Information leakage
Introduction Big Data Cloud computing Security SecureNoSQL Information Leakage Conclusion References
Outline
In this investigation, it was found that modern cryptographic supports readily available to process queries on the encrypted very large scale data-stores. Furthermore, we designed the system that provides security and integrity and overhead of system is proportional to the required level of security.
Conclusion
The crypto primitives in modern cryptography is proportional to the desired security level or operations we need to curred out on the ciphertext. Here is a simple comparison between the overhead of different security configurations.
Conclusion
DB Plain OPE64 OPE128 OPE256 OPE512 HOM
Size(MB) 170 430 508 662 1000 3400
Introduction Big Data Cloud computing Security SecureNoSQL Information Leakage Conclusion References
Outline
[1] Ponemon Institute Research Report. 2015 Global Megatrends in Cybersecurity. , ,Feb 2015.
[2] M. Ahmadian, F. Plochan, Z. Roessler, and D. C. Marinescu, “SecureNoSQL: An approach for secure search of encrypted nosql databases in the public cloud,” International Journal of Information Management, vol. 37, no. 2, pp. 63– 74, 2017. [Online]. Available:http://www.sciencedirect.com/science/article/pii/S0268401216302262
[3] R. A. Popa, C. M. S. Redfield, N. Zeldovich, and H. Balakrishnan, “Cryptdb: Protecting confidentialitywith encrypted query processing,” Proc. of the Twenty-Third ACM Symposium on Operating SystemsPrinciples, pp. 85–100, 2011.
[4] H. Hu, J. Xu, C. Ren, and B. Choi, “Processing private queries over untrusted data cloud throughprivacy homomorphism,” in Data Engineering (ICDE), 2011 IEEE 27th International Conference on.IEEE, 2011, pp. 601–612.
[5] A. Boldyreva, N. Chenette, Y. Lee, and A. Oneill, “Order-preserving symmetric encryption,” Advancesin Cryptology-EUROCRYPT, pp. 224–241, 2009.
[6] M. Ahmadian, A. Paya, and D. Marinescu, “Security of applications involving multiple organizationsand order preserving encryption in hybrid cloud environments,” IEEE International conf. on ParallelDistributed Processing Symposium Workshops (IPDPSW), pp. 894–903, May 2014.
[7] C. Gentry et al., “Fully homomorphic encryption using ideal lattices.” in STOC, vol. 9, 2009, pp. 169–178.
[8] M. Ahmadian, “SECURE QUERY PROCESSING in CLOUD NoSQL,” in IEEE International Conference on Consumer Electronics (ICCE) (2017 ICCE), Las Vegas, USA, Jan. 2017.
[9] Ahmadian, M., Khodabandehloo, J., & Marinescu, D. (2015). A security scheme for geographic informa-tion databases in location based systems. IEEE SoutheastCon, (pp. 1–7). doi:10.1109/SECON.2015.7132941.
Reference
Question & Discussion