Presentation1
-
Upload
prabhakar-shrivastav -
Category
Documents
-
view
15 -
download
0
Transcript of Presentation1
Unobtrusive Data Leakage
Detecting
Presented By
Shruti Meshram
TP4F1314015
Under the guidance of
Prof. H. K. Chavan
Introduction
Data Leakage.
Data Leakage Detection.
Traditional ways of Data Leakage Detection.
Proposed System.
4
Problem Entities
Entity Dataset
DistributorT
Set of all valuable data
AgentsU1, …, Un
R1, …, Rn
Ri: Subset of records from T received by an agent Ui
LeakerS
Set of leaked data
6
Agent’s Data Requests
• Sample
– Ri = SAMPLE(T, mi) i.e. Any subset of mi records
from T can be given to Ui.
• Explicit
– Ri = EXPLICIT(T, conditioni) i.e. Ui receives all T
records that satisfy some condition.
7
Guilt Models (1/3)
9
Other Sourcese.g. Sarah’s
Network
9
p
p: posterior probability that a leaked profile comes from other sources
p
Guilty Agent: Agent who leaks at least one profile
Pr{Gi|S}: probability that agent Ui is guilty, given the leaked set of profiles S
Guilt Models (2/3)
1010
or
or
Agents leak each of their data items independently
Agents leak all their data items OR nothing
or
(1-p)2
(1-p)p
p(1-p)
p2
The Distributor’s Objective (1/2)
13
U1
U2
U3
U4
R1
Pr{G1|S}>>Pr{G2|S}
Pr{G1|S}>> Pr{G4|S}
S (leaked)
R1
R3
R2
R3
R4
The Distributor’s Objective (2/2)
• To achieve his objective the distributor has to
distribute sets Ri, …, Rn that
minimize
• Intuition: Minimized data sharing among
agents makes leaked data reveal the guilty
agents
14
njiRRRi ij
ji
i
,...,1,,1
Distribution Strategies – Sample (1/4)
• Set T has four profiles:
– Kiran, John, Sarah and Mark
• There are 4 agents:
– U1, U2, U3 and U4
• Each agent requests a sample of any 2 profiles
of T for a market survey
15
Distribution Strategies – Sample (3/4)
• Optimal Distribution
• Avoid full overlaps and minimize
17
U1
U2
U3
U4
i ij
ji
i
RRR
1
Distribution Strategies
Sample Data Requests
• The distributor has the freedom
to select the data items to
provide the agents with
• General Idea:
– Provide agents with as
much disjoint sets of data as
possible
• Problem: There are cases where
the distributed data must
overlap E.g., |Ri|+…+|Rn|>|T|
Explicit Data Requests
• The distributor must provide
agents with the data they request
• General Idea:
– Add fake data to the
distributed ones to minimize
overlap of distributed data
• Problem: Agents can collude and
identify fake data
18
Conclusions
• Modeled as maximum likelihood problem
• Data distribution strategies that help identify
the guilty agents
19
References
• [1] R. Agrawal and J. Kiernan, “Watermarking Relational Databases, ”Proc. 28thInt’l Conf. Very Large Data Bases (VLDB ’02), VLDB Endowment, pp. 155-166,2002.
• [2] R. Sion, M. Atallah, and S. Prabhakar, “Rights Protection for Relational Data,”IEEE Trans. Knowledge And Data Engineering , vol. 16, no. 12, Dec. 2004.
• [3] P. Buneman, S. Khanna, and W.C. Tan, “Why and Where: A Characterization ofData Provenance,” Proc. Eighth Int’l Conf. Database Theory (ICDT ’01), J.V. denBussche and V. Vianu, eds.,pp. 316-330, Jan. 2001.
• [4] P.Buneman and W.-C. Tan “Provenance in Databases,” Proc. ACM SIGMOD,pp. 1171-1173, 2007.
• [5] Y.Cui and J. Widom, “Lineage Tracing for General Data WarehouseTransformations,” The VLDB J., vol. 12, pp. 41-58, 2003.
• [6] S.Czerwinski, R. Fromm, and T. Hodes, “Digital Music Distribution and AudioWatermarking,” http://www.scientificcommons. org/43025658, 2007.
References
• [7] Jen-Sheng, Win-Bin Huang,Chao-Lieh Chen, Yau-Hwang Kuo, “A Feature-Based Digital Image Watermarking For Copyright Protection and ContentAuthentication,” 1-4244-1437-7/07/$20.00 ,2007 IEEE ,v-469,ICIP 2007.
• [8] F. Hartung and B. Girod, “Watermarking of Uncompressed and CompressedVideo,” Signal Processing, vol. 66, no. 3, pp. 283-301,1998.
• [9] Y. Li, V. Swarup, and S. Jajodia, “Fingerprinting Relational Databases:Schemes and Specialties,” IEEE Trans. Dependable and Secure Computing, vol. 2,no. 1, pp. 34-45, Jan.-Mar. 2005.
• [10] S. Jajodia, P. Samarati, M.L. Sapino, and V.S. Subrahmanian, “FlexibleSupport for Multiple Access Control Policies,” ACM Trans. Database Systems, vol.26, no. 2, pp. 214-260, 2001.
• [11] L. Sweeney, “Achieving K-Anonymity Privacy Protection UsingGeneralization and Suppression,” http://en.scientificcommons. org/43196131, 2002.