Tuesday 10 May 5 - 6.30 pm Level 5 Theatrette, 121 ... Five Safes Framework Safe people Safe project...

41
De-Identification Tuesday 10 May 5 - 6.30 pm Level 5 Theatrette, 121 Exhibition Street Melbourne Privacy and Data Protection Week 9-13 May 2016

Transcript of Tuesday 10 May 5 - 6.30 pm Level 5 Theatrette, 121 ... Five Safes Framework Safe people Safe project...

De-Identification Tuesday 10 May 5 - 6.30 pm Level 5 Theatrette, 121 Exhibition Street Melbourne

Privacy and Data Protection Week 9-13 May 2016

Commissioner forPrivacy and Data Protection

Commissioner forPrivacy and Data Protection

Commissioner forPrivacy and Data Protection

ORANGE – PMS 1655UPBLUE – PMS 2756UPMUSEO SLAB – 100/700

Commissionerfor Privacy and Data Protection

Commissionerfor Privacy and Data Protection

Commissionerfor Privacy and Data Protection

ORANGE – PMS 1655UPBLUE – PMS 2756UPMUSEO SLAB – 100/700

2UNCLASSIFIED

PresentersAgency Name Role

SiuMingTan ChiefMethodologistandGeneralManageroftheMethodologyDivisionattheAustralianBureauofSta@s@cs(ABS)

Dr.StephenHardy GroupleaderforDataPlaFormEngineeringatData61inCSIRO

FionaDowsley ChiefSta@s@cianofVictorianCrimeSta@s@csAgency,andAc@ngDirectorofStrategicPlanningattheDepartmentofJus@ce&Regula@on

GregGough ManageroftheDataVicAccessPolicy,DepartmentofTreasuryandFinance

Siu Ming Tan

ManagingDataConfidenBalityandAccessattheABS

10May2016

WhatshallIcover?

SomeTerminology

Legisla@verequirementonmaintainingconfiden@ality

DataU@lity&DisclosureRisk

TheFiveSafesFramework

SomeTerminology

Privacy: Requirementtorespecttheprivateinforma@onofindividuals

ConfidenBality: Requirementthatinforma@on,whetherprivateornot,bestored,keptorreleasedinamannerthatiden@fica@onofwhotheinforma@onreferstoisnotpossible

AnonymisaBon: Processtoremovethedirectiden@fiersfrominforma@on(e.g.name,address,ABN).

Un-idenBfiableinfo:Informa@ontreatedinsuchawaythatre-iden@fica@onisnotpossible

TheCensusandStaBsBcsAct,1905

Ø  EveryABSofficertosignanundertakingoffidelityandsecrecy(sec@on7),

Ø  Sta@s@calinforma@onnottobedisseminatedinamannerlikelytoenable

theiden@fica@onofapar@cularpersonororganisa@on(subsec@on12(2))

Ø  De-iden@fica@onisnotsufficienttomeetlegisla@verequirements

Ø  Releasemustnotlikelyleadtore-iden@fica@on

DataUBlityversusDisclosureRisk(I)

DisclosureRiskDataUBlity

ProtecBons

Abilityinusingthedatatodrawvalidconclusions

Ø  SpontaneousRecogni@on

Ø  Matchingrisk

Ø  Higherriskforunitrecord

thanaggregateddata

Ø  Perturba@onØ  CellSuppressionØ  CollapsingofCategories

Ø  Sampling

Ø  Recordmasking

Ø  Subs@tu@onofValues

DataUBlityversusDisclosureRisk(II)

Ø  Disclosureriskreducebyapplyingmoreprotec@ons,butdatau@lityis

reduced

Ø  Datau@lityismaximisedifthereisnoprotec@onapplied,butdisclourerisk

issignificantlyincreased

Ø  Wheretodrawthebalance?

Ø  Needtothinkbeyondjustapplyingdataprotec@ons.

TheFiveSafesFramework

Safepeople

Safeproject

Safese`ng

Safedata

Safeoutput

Canthepersonbetrustedtousethedataappropriately?

Isthespecificuseofthedataappropriate?

Howdoesthemodeofaccesslimittheriskofdisclosure?

Howmuchprotec@onsaretobeappliedtothedata?

Howmuchcontrolsareappliedtoensuretheoutputisnon-disclosive?

Amul@dimensionalapproachtodisclosureriskassessment

Dr. Stephen Hardy

www.csiro.au

UBlityvsPrivacyWhyde-idenBficaBonisdifficult

DrStephenHardyAugust2015

NeVlixre-idenBficaBon

U@lityvsPrivacy|StephenHardy13|

100,000,000moviera@ngs480,000NeFlixsubscribers

Anonymised:Id–movie–ra@ng-date

200510%sample

“RobustDe-anonymiza@onofLargeSparseDatasets”,NarayananandShma@kov(2008)

IdenBfied:Name–movie–ra@ng-date

RaBngUniqueness

14|

8ra@ngs(2maybewrong)andadatewithin2weeksuniquelyiden@fies99%ofthepeopleintheNeFlixdatabase

U@lityvsPrivacy|StephenHardy

Mobilitydata

15|

“Unique in the Crowd: The privacy bounds of human mobility”, de Montjoye, Hidalgo, Verleysen, & Blondel. (2013).

U@lityvsPrivacy|StephenHardy

Uniquenessof1.5millionusers

16|

4loca@ons&@mesuniquelycharacterizes95%ofthepeopleina1.5mpersonmobilitydatabase

U@lityvsPrivacy|StephenHardy

UBlityvsPrivacy

17|

Themoredatathatislinkedtogether, themoreuniqueitbecomes

Themoredatathatislinkedtogether, themoreusefulitbecomes

But… Because…

U@lityvsPrivacy|StephenHardy

CurrentApproachestoAnonymisaBon

18|

•  Losesvaluableinforma@on.•  Cans@llbere-iden@fiedinsomecases.

2.Generalisa@on+grouping

1.Masking

FirstName:JohnLastName:Smith

Email:[email protected]:1SmithSt

Address2:Sydney,2000LastTravelDes@na@on:Spain

TravelDate:January2015

FirstName:JohnLastName:Smith

Email:Address1:

Address2:

LastTravelDes@na@on:SpainTravelDate:January2015

U@lityvsPrivacy|StephenHardy

DifferenBalPrivacy

19|

TunedRandomnoise

Originaldata

Removeanyperson

Noisydata

TunedRandomnoise

Noisydata

IndisBnguishable!

U@lityvsPrivacy|StephenHardy

Anonaly@x:Privacy-SafeDataRelease|RoksanaBoreli20|

CreatesSynthe@cData–WithPrivacyLevelGuaranteesHighDataGranularity(UnitRecords)forspecificanalyses

AnonalyBxPrivacyTechnology

ConfidenBalCompuBng

21|

Encrypted data

Encrypted data

Encrypted Analysis

Decrypted Answers

U@lityvsPrivacy|StephenHardy

DifferenBalPrivacy

Tradeoffs

22|

UBlity

Privacy

Rawdata

Masking

k-Anonymity

EncryptedComputaBon

U@lityvsPrivacy|StephenHardy

Fiona Dowsley

Greg Gough

Data for Victorians

Greg Gough Manager, DataVic Access Policy

Benefits of open data

•  Increases productivity and improves personal and business decision making.

•  Improves research outcomes. •  Improves the efficiency and effectiveness of

government.

26

Economic Value

27

The Australian economy will grow by an extra $16 billion a year if government agencies make most

of their data freely available to the public.

•  Stimulates economic activity and drives innovation and new services.

Open data value

Burke Road level crossing

DataVic Access Policy

•  The default obligation under the Policy is for agencies to make de‑identified datasets available.

•  If a dataset contains personally identifiable information, and cannot be de‑identified, it is not suitable for release under the Policy.

Open by design

•  When developing or procuring a database or dataset consideration should be given in the design phase to enabling public access to the data that is suitable for release under the Policy.

Just another way of looking at ‘Privacy by design’

Examples of de-identified data use in Australia

31

DataVic example of de-identified data use

32 TripRisk by Geoplex

Queensland Police example

Queensland Police example

OS examples of (de)-identified data use

Expectations?

35

US Police example

NYC Taxi data

Taxi data

Further information

•  Websites: www.data.vic.gov.au www.dtf.vic.gov.au

•  Email: [email protected] •  Twitter @data_vic •  Phone: (03) 9651 1880

© State of Victoria 2016 You are free to re-use this work under a Creative Commons Attribution 4.0 licence, provided you credit the State of Victoria (Department of Treasury and Finance) as author, indicate if changes were made and comply with the other licence terms. The licence does not apply to any branding, including Government logos. Copyright queries may be directed to [email protected]

Thank you