Presentation1

22
Unobtrusive Data Leakage Detecting Presented By Shruti Meshram TP4F1314015 Under the guidance of Prof. H. K. Chavan

Transcript of Presentation1

Unobtrusive Data Leakage

Detecting

Presented By

Shruti Meshram

TP4F1314015

Under the guidance of

Prof. H. K. Chavan

Outline

• Introduction

• Problem Description

• Guilt Model

• Distribution Strategies

2

Outline

• Introduction

• Problem Description

• Guilt Models

• Distribution Strategies

3

Introduction

Data Leakage.

Data Leakage Detection.

Traditional ways of Data Leakage Detection.

Proposed System.

4

Outline

• Introduction

• Problem Description

• Guilt Model

• Distribution Strategies

5

Problem Entities

Entity Dataset

DistributorT

Set of all valuable data

AgentsU1, …, Un

R1, …, Rn

Ri: Subset of records from T received by an agent Ui

LeakerS

Set of leaked data

6

Agent’s Data Requests

• Sample

– Ri = SAMPLE(T, mi) i.e. Any subset of mi records

from T can be given to Ui.

• Explicit

– Ri = EXPLICIT(T, conditioni) i.e. Ui receives all T

records that satisfy some condition.

7

Outline

• Introduction

• Problem Description

• Guilt Model

• Distribution Strategies

8

Guilt Models (1/3)

9

Other Sourcese.g. Sarah’s

Network

9

p

p: posterior probability that a leaked profile comes from other sources

p

Guilty Agent: Agent who leaks at least one profile

Pr{Gi|S}: probability that agent Ui is guilty, given the leaked set of profiles S

Guilt Models (2/3)

1010

or

or

Agents leak each of their data items independently

Agents leak all their data items OR nothing

or

(1-p)2

(1-p)p

p(1-p)

p2

Guilt Models (3/3)

Independently NOT Independently

11

Pr{G1}

Pr{G2} Pr{G2}

Pr{G1}

Outline

• Introduction

• Problem Description

• Guilt Model

• Distribution Strategies

12

The Distributor’s Objective (1/2)

13

U1

U2

U3

U4

R1

Pr{G1|S}>>Pr{G2|S}

Pr{G1|S}>> Pr{G4|S}

S (leaked)

R1

R3

R2

R3

R4

The Distributor’s Objective (2/2)

• To achieve his objective the distributor has to

distribute sets Ri, …, Rn that

minimize

• Intuition: Minimized data sharing among

agents makes leaked data reveal the guilty

agents

14

njiRRRi ij

ji

i

,...,1,,1

Distribution Strategies – Sample (1/4)

• Set T has four profiles:

– Kiran, John, Sarah and Mark

• There are 4 agents:

– U1, U2, U3 and U4

• Each agent requests a sample of any 2 profiles

of T for a market survey

15

Distribution Strategies – Sample (2/4)

Poor

ji

ji RRMinimize

16

U1

U2

U3

U4

U1

U2

U3

U4

Distribution Strategies – Sample (3/4)

• Optimal Distribution

• Avoid full overlaps and minimize

17

U1

U2

U3

U4

i ij

ji

i

RRR

1

Distribution Strategies

Sample Data Requests

• The distributor has the freedom

to select the data items to

provide the agents with

• General Idea:

– Provide agents with as

much disjoint sets of data as

possible

• Problem: There are cases where

the distributed data must

overlap E.g., |Ri|+…+|Rn|>|T|

Explicit Data Requests

• The distributor must provide

agents with the data they request

• General Idea:

– Add fake data to the

distributed ones to minimize

overlap of distributed data

• Problem: Agents can collude and

identify fake data

18

Conclusions

• Modeled as maximum likelihood problem

• Data distribution strategies that help identify

the guilty agents

19

References

• [1] R. Agrawal and J. Kiernan, “Watermarking Relational Databases, ”Proc. 28thInt’l Conf. Very Large Data Bases (VLDB ’02), VLDB Endowment, pp. 155-166,2002.

• [2] R. Sion, M. Atallah, and S. Prabhakar, “Rights Protection for Relational Data,”IEEE Trans. Knowledge And Data Engineering , vol. 16, no. 12, Dec. 2004.

• [3] P. Buneman, S. Khanna, and W.C. Tan, “Why and Where: A Characterization ofData Provenance,” Proc. Eighth Int’l Conf. Database Theory (ICDT ’01), J.V. denBussche and V. Vianu, eds.,pp. 316-330, Jan. 2001.

• [4] P.Buneman and W.-C. Tan “Provenance in Databases,” Proc. ACM SIGMOD,pp. 1171-1173, 2007.

• [5] Y.Cui and J. Widom, “Lineage Tracing for General Data WarehouseTransformations,” The VLDB J., vol. 12, pp. 41-58, 2003.

• [6] S.Czerwinski, R. Fromm, and T. Hodes, “Digital Music Distribution and AudioWatermarking,” http://www.scientificcommons. org/43025658, 2007.

References

• [7] Jen-Sheng, Win-Bin Huang,Chao-Lieh Chen, Yau-Hwang Kuo, “A Feature-Based Digital Image Watermarking For Copyright Protection and ContentAuthentication,” 1-4244-1437-7/07/$20.00 ,2007 IEEE ,v-469,ICIP 2007.

• [8] F. Hartung and B. Girod, “Watermarking of Uncompressed and CompressedVideo,” Signal Processing, vol. 66, no. 3, pp. 283-301,1998.

• [9] Y. Li, V. Swarup, and S. Jajodia, “Fingerprinting Relational Databases:Schemes and Specialties,” IEEE Trans. Dependable and Secure Computing, vol. 2,no. 1, pp. 34-45, Jan.-Mar. 2005.

• [10] S. Jajodia, P. Samarati, M.L. Sapino, and V.S. Subrahmanian, “FlexibleSupport for Multiple Access Control Policies,” ACM Trans. Database Systems, vol.26, no. 2, pp. 214-260, 2001.

• [11] L. Sweeney, “Achieving K-Anonymity Privacy Protection UsingGeneralization and Suppression,” http://en.scientificcommons. org/43196131, 2002.

Thank You!