Privacy Framework for RDF Data Mining

Post on 12-Jan-2016

41 views 0 download

Tags:

description

Privacy Framework for RDF Data Mining. Master’s Thesis Project Proposal By: Yotam Aron. Overview. Motivation and Goal Background Proposed Solution and Design Example Conclusion. Motivation. D ata mining continues to become more widespread. Useful for research, public policy, etc. - PowerPoint PPT Presentation

Transcript of Privacy Framework for RDF Data Mining

Privacy Framework for RDF Data Mining

Master’s Thesis Project ProposalBy: Yotam Aron

OverviewMotivation and GoalBackgroundProposed Solution and DesignExampleConclusion

MotivationData mining continues to become

more widespread.◦Useful for research, public policy,

etc.Want to maintain privacy of

participants in the database.Little work has been done for

privacy for semantic web data.

Previous WorkAnonymizationK-Anonimity1

Differential Privacy systems: PINQ2, AIRAVAT3.

Drawbacks:◦Do not apply to semantic web data.◦Do not support SPARQL.

GoalDevelop a system to protect

dataset participants’ personal data in SPARQL.

Integrates well with existing SPARQL endpoints.

Relatively easy for the user and the administrator to use.

BackgroundRule-based Privacy Policies in AIRDifferential Privacy

Rule-based Privacy Policies in AIR4

Rules define patterns in a SPARQL query.

If pattern is matched, rule infers compliance or non-compliance of incoming SPARQL query.

AIR Example5

 air:if {:W s:TriplePattern :T . :T log:includes { :X type:F :V }.

 }; air:then [   air:description (“type:F was selected in " q:QUERY) ;   air:assert { q:QUERY air:non-compliant-with q:Policy4 . } ] .

SELECT ?s WHERE {?s type:F ?p}

AIR Policy (extract)

Query

AIR will show that the query is non-compliant with Policy4.

Differential Privacy OverviewMinimize probability of privacy

breach.Maximize statistical accuracy.Definition requires that given two

similar datasets, a function query on those two datasets give similar results with high probability.

Makes no assumptions on the underlying dataset.

Differential PrivacyDefinition: We say a randomized

computation M provides ɛ-differential privacy if for any two data sets A and B, and any set of possible outputs S ⊆ Range(M),

Pr[M(A) ∈ S] ≤ Pr[M(B) ∈ S] × exp( ɛ × |A ⊕ B|).

Differential Privacy in PracticeEach user is given an ɛ value that

cannot be exceeded.Each query qi has some noise value ɛi . In

total, the user’s queries must satisfy the property

Noise (usually Laplace), which depends on the aggregate function, is added with variance

Limitations of Differential PrivacyOnly statistical data protected.High variance in data yields poor

query results.Theory not always perfect in

practice.◦Assume no collusion among users.◦Covert channel attacks.6

◦What value of ɛ to choose?

Example, No DPName Salary

Alice 31,000

Bob 47,000

Charlie 20,000

David 21,000

SELECT COUNT(Name) WHERE (Age < 25)

2

Example, No DPName Salary

Alice 31,000

Bob 47,000

Charlie 20,000

SELECT COUNT(Name) WHERE (Age < 25)

1 Big difference in answers!!

Example, With DPName Salary

Alice 31,000

Bob 47,000

Charlie 20,000

David 21,000

SELECT COUNT(Name) WHERE (Age < 25)

2 + noise = ~2 (with high probability)

Example, With DPName Salary

Alice 31,000

Bob 47,000

Charlie 20,000

SELECT COUNT(Name) WHERE (Age < 25)

1+ noise = ~2 (with high probability)

With high probability, records are indistinguishable!

Practical Consequences of DPAn individual’s inclusion in the

dataset is not likely a privacy risk.

The answers to the queries can still be useful.

Achieving Differential Privacy in RDFCurrent techniques for

differential privacy are developed for relational databases.

As a first approximation, reduce triple-store to a relational database.

Improved mechanism as project progresses.

Example of RDF-RDBS Reduction:Person1 foaf:name “Alice”;

foaf:member :DIGfoaf:age “21”foaf:knows :Person2 :Person3.

:Person2 foaf:name “Bob”;foaf:member :DIG;foaf:knows :Person3.

:Person3 foaf:name “Charlie”;foaf:age “22”.

ID Foaf:name

Foaf:member

Foaf:knows

Foaf:age

Person1 “Alice” DIG [Person2,Person3

“21”

Person2 “Bob” DIG [Person3] None

Person3 “Charlie” None None “22”

Proposed SolutionSPARQL Privacy Insurance

Module (SPIM)Build layer between user and

endpoint.Integrate both AIR and

differential privacy.Integrate credential-checking

system.Modify existing differential

privacy framework for use with triple-stores.

ContributionsComplete privacy protection for

triplestores.Differential Privacy sensitivity for

SPARQL 1.1 aggregate functions including count, sum, avg, sum, min, and max.

System Overview

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• TAAC Will:• Verify user has

permission to access

• Send central module data about user

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• SPIM:• Controls order of

privacy operations.

• Interfaces with the SPARQL endpoint.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• AIR:• Reasoner that

uses rule-based policies to check queries for privacy hazards.

• Extracts information for differential privacy.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Policy Files:• Contain the

rules for AIR.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Differential Privacy Module:• Checks to see

for query limits (based off ɛ use.

• Applies noise to statistical data.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• User Data:• Contains user ɛ

data.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• SPIM:• Controls order of

privacy operations.

• Interfaces with the SPARQL endpoint.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Service Description:• Contains

information to be used for the addition of noise.

• Miscellaneous:• Interface to SPARQL

Endpoint• Transaction File• Improved Differential

Privacy Output• Service Description

Generator

• Potential Extensions:• Robustness against

attacks• Concurrency• Optimization for large

systems• Customizable UI• Accountability

Sample ScenarioTriplestore datamining in

biotechnological applications.Biofirm provides data about

hospitals in the US.Alice is a PhD student at MIT.Alice would like to query Biofirm’s

database for research purposes. She just got permissions yesterday and is logging in for the first time.

PreprocessingBiofirm installs SPIM, and runs

the service description generation code.◦May need to create the correct

interface.Makes sure the UI is accessible

online.

Sample Compliant QueryAlice would like to know the total

number of visits that Boston hospitals received.

SELECT (SUM(?s) as ?people) WHERE{?h a biofirm:Hospital.?h biofirm:visits ?s.?h biofirm:location geo:Boston.

}

Epsilon value: 1.0

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Alice enters query into the provided user interface.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• TAAC insures that biofirm has given Alice access to its triple-store.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Query request arrives at SPIM central module.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Policyrunner is called upon to check query for triple patterns that are in violation.

• No violations found. • Since this is Alice’s

first time, AIR extracts what type of permissions Alice has.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• SPIM creates a profile for Alice. • Gives her an ɛ

value (suppose it 2.0).

• Stores it in triple store.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• SPIM extracts which variables will yield statistical results and will have differential privacy applied.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Differential Privacy module assures that query’s results will not exceed given epsilon value.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• This is Alice’s first time, and her epsilon value is 2.0 and the epsilon for this query is 1.0. Everything looks good.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Query is sent to the endpoint.

• Results are received.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Differential privacy module adds noise to appropriate fields, and updates epsilon values.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• SPIM is ready to return the results.

SPIM Privacy Module

TAAC Credential Checking

AIR Rule Based Privacy

Differential Privacy Module

SPARQL Endpoint

User Interface

Policy Files

User Data

Service Descriptio

n

• Alice receives results.

SummarySystem will combine rule-based

privacy with differential privacy.Develop differential privacy

techniques for semantic web data.

Make privacy module client and administrator friendly.