Presented By Amarjit Datta
-
Upload
bernadette-barnett -
Category
Documents
-
view
216 -
download
2
description
Transcript of Presented By Amarjit Datta
Presented By Amarjit Datta
Privacy-Preserving Multi-Keyword Ranked Search over Encrypted Cloud
Data Presented By Amarjit Datta Authors and Publication
Information
Ning CaoPhD in ECE from the Worcester Polytechnic Institute Cong
WangPhD in ECE from Illinois Institute of Technology Ming LiPhD in
ECE from the Worcester Polytechnic Institute Kui RenPhD in ECE from
the Worcester Polytechnic Institute. Wenjing LouPhD in ECE from the
University of Florida Table of Contents Introduction Problem
Domain
Some Important Definitions MRSE framework and version MRSE schema
analysis MRSE schema improvements Introduction Cloud computing is
becoming more and more popularnowadays Why cloud is so popular?
Minimum startup cost Pay-as-you-go Easily scalable No server
administration overhead Introduction While uploading contents in
cloudthere can be manysecurity issues. There can be;
man-in-the-middle attack sniffing packets spoofing IP addresses and
many more In this research paper, authors analyzed the privacy
issuesthat can happenafter the content is uploaded on cloud.
Introduction Cloud server acts as honest-but-curioushonest It
follows the designated protocolscurious It want to infer and
analyze data in its storage So we will have to search encrypted
data, hosted in cloudenvironment, without sharing private
information with thecloud Introduction So what data owners can do
about it?
Data owner can encrypt his files before uploading it tocloud. But
how can they search encrypted files in cloud? Traditional plain
text keyword search wont work. Introduction We also need search
results in a ranked order (Example:most relevant) Coordinate
matching: Search for as many keyword matchesas possible in the
document. Privacy must be preserved
Problem Definition Performing single keyword based search over
encrypted datais already widely researched. This paper explores 2
new use-cases Multi keyword based search over encrypted cloud data.
Ranked search (Sort results based on relevance). Herethe paper used
coordinated matching for rank analysis. Privacy must be preserved
Problem Model Problem Formulation Data owner has collection of data
documents and theirencrypted forms. Data owner creates a encrypted
searchable index. Both encrypted file and encrypted searchable
index arecopied to the cloud server. To search, data users need
corresponding trapdoor T Problem Formulation Based on the amount of
information cloud server knows Known Ciphertext model - Cloud
server will only know the encrypted dataset and searchable indexes.
Known background model - Cloud server will know the encrypted
dataset + searchable index + additional information (Example:
Correlation of data search query) This is what we want! Data
privacyEncrypting data file and searchable index file Keyword
privacyHide what users are searching Trapdoor unlinkabilityTrapdoor
generation function should be randomized insteadof deterministic
one. Lets Check MRSE Schema! Main Idea is to confuse the cloud
server
So that it cannot detect the search key words and document type. We
can do that using randomization on different steps Lets Check MRSE
Schema! Notations MRSE Basic Framework MRSE Framework Upload
encrypted files and indexes Query Key Data owner
Setup Trapdoor Build Index Key Data owner Data user How to Do
Ranking? - Similarity Calculation
Di is a binary data vector for document Fi where each bitDi is
either 0 or 1 represents the existence of thecorresponding keyword
Wj in that document Q is a binary query vector indicating the
keywords ofinterest where each bit Qj represents the existence of
thecorresponding keyword Wj in the query. The similarity score of
document Fi to query is thereforeexpressed as the inner product of
their binary columnvectors, i.e., Di . Q. MRSE_I Scheme MRSE_I
Scheme Setup: The data owner randomly generates a (n+2)-bitvector
as S and two (n+2) x (n+2) invertible matricesM1;M2. Generate
secret key SK is in the form of a 3-tupleas {S;M1;M2}n is the
number of fields for each recordn + 2 is = n {dummy random keyword}
Build-Index: The data owner generates a binary datavector Di for
every document Fi, where each binary bit Di[j]represents whether
the corresponding keyword Wj appearsin the document Fi. MRSE_I
Scheme Trapdoor: With t keywords of interest, one binary vector Qis
generated where each bit Qj indicates whether Wjbelongs to W is
true or false. Based on this vector, trapdooris generated. Query:
With the trapdoor, the cloud server computes thesimilarity scores
of each document Fi. After sorting all scores, the cloud server
returns the top-kranked id list MRSE_I - Analysis Functionality:
Random dummy keyword introduced canfollow a normal distribution
where the standard deviationfunctions as a flexible tradeoff
parameter among searchaccuracy and security. MRSE_I - Analysis Data
privacy: Is preserved by the encryption of data. Index privacy:
Secret until the secure key is protected. With the randomness
introduced by the splitting process andthe random numbers r, and t,
our basic scheme can generatetwo totally different trapdoors for
the same query. Improvement of MRSE_1 MRSE is secure enough for
known Cyphertext model. But for known background model, this is not
sufficient. For example: Document frequency, which can be
furthercombined with background information to identify the
keywordin a query at high probability. Improvement of MRSE_1 -
Scale Analysis Attack
Given two correlated trapdoors T1 and T2 for query keywords {K1;K2}
and {K1;K2;K3} and three documents, respectively, the cloud server
could deduce that whether all the three documents contain K3 or
none of them contain K3. From this cloud server can find out
document frequency MRSE_2 Scheme U is the number of dummy keywords
inserted.
In MRSE_1, only 1 dummy keyword was used in 1 document.Both Build
Index and Query considers U More Improvement So far we have used
number of keywords available in thedocument count only for doing
ranking. But there can be some other important facts too. For
example: When a keyword appears in all documents,its important is
less. So considering keyword weight while ranking documents canbe
an improvement More Improvement MRSE_I_TF schema is the improved
version of MRSE thatconsiders weight of the keyword during
similaritycalculation. MRSE_2_TF schema incorporate both the idea
ofMRSE_I_TF (weighted keyword) and MRSE_2 (List ofrandom dummy
keywords) Future Possible Work For future work, authors will
explore checking the integrityof the rank order in the search
result assuming the cloudserver is untrusted.