Presentation1

Post on 21-Jul-2015

15 views 0 download

Tags:

Transcript of Presentation1

Unobtrusive Data Leakage

Detecting

Presented By

Shruti Meshram

TP4F1314015

Under the guidance of

Prof. H. K. Chavan

Outline

• Introduction

• Problem Description

• Guilt Model

• Distribution Strategies

2

Outline

• Introduction

• Problem Description

• Guilt Models

• Distribution Strategies

3

Introduction

Data Leakage.

Data Leakage Detection.

Traditional ways of Data Leakage Detection.

Proposed System.

4

Outline

• Introduction

• Problem Description

• Guilt Model

• Distribution Strategies

5

Problem Entities

Entity Dataset

DistributorT

Set of all valuable data

AgentsU1, …, Un

R1, …, Rn

Ri: Subset of records from T received by an agent Ui

LeakerS

Set of leaked data

6

Agent’s Data Requests

• Sample

– Ri = SAMPLE(T, mi) i.e. Any subset of mi records

from T can be given to Ui.

• Explicit

– Ri = EXPLICIT(T, conditioni) i.e. Ui receives all T

records that satisfy some condition.

7

Outline

• Introduction

• Problem Description

• Guilt Model

• Distribution Strategies

8

Guilt Models (1/3)

9

Other Sourcese.g. Sarah’s

Network

9

p

p: posterior probability that a leaked profile comes from other sources

p

Guilty Agent: Agent who leaks at least one profile

Pr{Gi|S}: probability that agent Ui is guilty, given the leaked set of profiles S

Guilt Models (2/3)

1010

or

or

Agents leak each of their data items independently

Agents leak all their data items OR nothing

or

(1-p)2

(1-p)p

p(1-p)

p2

Guilt Models (3/3)

Independently NOT Independently

11

Pr{G1}

Pr{G2} Pr{G2}

Pr{G1}

Outline

• Introduction

• Problem Description

• Guilt Model

• Distribution Strategies

12

The Distributor’s Objective (1/2)

13

U1

U2

U3

U4

R1

Pr{G1|S}>>Pr{G2|S}

Pr{G1|S}>> Pr{G4|S}

S (leaked)

R1

R3

R2

R3

R4

The Distributor’s Objective (2/2)

• To achieve his objective the distributor has to

distribute sets Ri, …, Rn that

minimize

• Intuition: Minimized data sharing among

agents makes leaked data reveal the guilty

agents

14

njiRRRi ij

ji

i

,...,1,,1

Distribution Strategies – Sample (1/4)

• Set T has four profiles:

– Kiran, John, Sarah and Mark

• There are 4 agents:

– U1, U2, U3 and U4

• Each agent requests a sample of any 2 profiles

of T for a market survey

15

Distribution Strategies – Sample (2/4)

Poor

ji

ji RRMinimize

16

U1

U2

U3

U4

U1

U2

U3

U4

Distribution Strategies – Sample (3/4)

• Optimal Distribution

• Avoid full overlaps and minimize

17

U1

U2

U3

U4

i ij

ji

i

RRR

1

Distribution Strategies

Sample Data Requests

• The distributor has the freedom

to select the data items to

provide the agents with

• General Idea:

– Provide agents with as

much disjoint sets of data as

possible

• Problem: There are cases where

the distributed data must

overlap E.g., |Ri|+…+|Rn|>|T|

Explicit Data Requests

• The distributor must provide

agents with the data they request

• General Idea:

– Add fake data to the

distributed ones to minimize

overlap of distributed data

• Problem: Agents can collude and

identify fake data

18

Conclusions

• Modeled as maximum likelihood problem

• Data distribution strategies that help identify

the guilty agents

19

References

• [1] R. Agrawal and J. Kiernan, “Watermarking Relational Databases, ”Proc. 28thInt’l Conf. Very Large Data Bases (VLDB ’02), VLDB Endowment, pp. 155-166,2002.

• [2] R. Sion, M. Atallah, and S. Prabhakar, “Rights Protection for Relational Data,”IEEE Trans. Knowledge And Data Engineering , vol. 16, no. 12, Dec. 2004.

• [3] P. Buneman, S. Khanna, and W.C. Tan, “Why and Where: A Characterization ofData Provenance,” Proc. Eighth Int’l Conf. Database Theory (ICDT ’01), J.V. denBussche and V. Vianu, eds.,pp. 316-330, Jan. 2001.

• [4] P.Buneman and W.-C. Tan “Provenance in Databases,” Proc. ACM SIGMOD,pp. 1171-1173, 2007.

• [5] Y.Cui and J. Widom, “Lineage Tracing for General Data WarehouseTransformations,” The VLDB J., vol. 12, pp. 41-58, 2003.

• [6] S.Czerwinski, R. Fromm, and T. Hodes, “Digital Music Distribution and AudioWatermarking,” http://www.scientificcommons. org/43025658, 2007.

References

• [7] Jen-Sheng, Win-Bin Huang,Chao-Lieh Chen, Yau-Hwang Kuo, “A Feature-Based Digital Image Watermarking For Copyright Protection and ContentAuthentication,” 1-4244-1437-7/07/$20.00 ,2007 IEEE ,v-469,ICIP 2007.

• [8] F. Hartung and B. Girod, “Watermarking of Uncompressed and CompressedVideo,” Signal Processing, vol. 66, no. 3, pp. 283-301,1998.

• [9] Y. Li, V. Swarup, and S. Jajodia, “Fingerprinting Relational Databases:Schemes and Specialties,” IEEE Trans. Dependable and Secure Computing, vol. 2,no. 1, pp. 34-45, Jan.-Mar. 2005.

• [10] S. Jajodia, P. Samarati, M.L. Sapino, and V.S. Subrahmanian, “FlexibleSupport for Multiple Access Control Policies,” ACM Trans. Database Systems, vol.26, no. 2, pp. 214-260, 2001.

• [11] L. Sweeney, “Achieving K-Anonymity Privacy Protection UsingGeneralization and Suppression,” http://en.scientificcommons. org/43196131, 2002.

Thank You!