Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

41
Hippocratic Hippocratic Databases Databases Rakesh Agrawal Rakesh Agrawal Jerry Kiernan Jerry Kiernan Ramakrishnan Srikant Ramakrishnan Srikant Yirong Xu Yirong Xu

Transcript of Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Page 1: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Hippocratic DatabasesHippocratic Databases

Rakesh AgrawalRakesh AgrawalJerry KiernanJerry KiernanRamakrishnan SrikantRamakrishnan SrikantYirong XuYirong Xu

Page 2: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

AlgorithmsAlgorithms

Performance GraphsPerformance Graphs

Founding PrinciplesFounding Principles

New ChallengesNew Challenges

Vision PaperVision Paper

Page 3: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

The Hippocratic OathThe Hippocratic Oath

“ “What I may see or hear in the course of treatment What I may see or hear in the course of treatment or even outside of the treatment in regard to the life or even outside of the treatment in regard to the life of men, which on no account [ought to be] spread of men, which on no account [ought to be] spread abroad, I will keep to myself, holding such things abroad, I will keep to myself, holding such things shameful to be spoken about.” shameful to be spoken about.”

– – Hippocratic Oath, 8 (circa 400 BC)Hippocratic Oath, 8 (circa 400 BC)

Page 4: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Privacy ViolationsPrivacy Violations

Accidents:Accidents:– Kaiser, GlobalHealthraxKaiser, GlobalHealthrax

Lax security:Lax security:– Massachusetts govt.Massachusetts govt.

Ethically questionable behavior: Ethically questionable behavior: – Lotus & Equifax, Lexis-Nexis, Medical Marketing Service, Lotus & Equifax, Lexis-Nexis, Medical Marketing Service,

Boston University, CVS & Giant FoodBoston University, CVS & Giant Food

Illegal:Illegal:– ToysmartToysmart

Page 5: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Growing Privacy ConcernsGrowing Privacy Concerns

Popular Press:Popular Press:– Economist: The End of Privacy (May 99)Economist: The End of Privacy (May 99)– Time: The Death of Privacy (Aug 97)Time: The Death of Privacy (Aug 97)

Govt. legislationGovt. legislation S. Garfinkel, "Database Nation: The Death of S. Garfinkel, "Database Nation: The Death of

Privacy in 21st Century", O' Reilly, Jan 2000Privacy in 21st Century", O' Reilly, Jan 2000 Special issue on internet privacy, CACM, Feb 99Special issue on internet privacy, CACM, Feb 99

Page 6: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Related WorkRelated Work

Statistical DatabasesStatistical Databases– Provide statistical information (sum, count, etc.) without Provide statistical information (sum, count, etc.) without

compromising sensitive information about individuals. compromising sensitive information about individuals. [AW89][AW89]

Multilevel Secure DatabasesMultilevel Secure Databases– Multilevel relations, e.g., records tagged “secret”, Multilevel relations, e.g., records tagged “secret”,

“confidential”, or “unclassified”, e.g. [JS91]“confidential”, or “unclassified”, e.g. [JS91]

Wish to protect privacy in transactional databases Wish to protect privacy in transactional databases that support daily operations.that support daily operations.– Cannot restrict queries to statistical queries.Cannot restrict queries to statistical queries.– Cannot tag all the records “top secret”.Cannot tag all the records “top secret”.

Page 7: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Current Database SystemsCurrent Database Systems

Ullman, “Principles of Database and Ullman, “Principles of Database and Knowledgebase Systems”Knowledgebase Systems”

Fundamental:Fundamental:– Manage persistent data.Manage persistent data.– Access a large amount of data efficiently.Access a large amount of data efficiently.

Desirable:Desirable:– Support for data model, high-level languages, transaction Support for data model, high-level languages, transaction

management, management, access controlaccess control, and resiliency., and resiliency.

Similar list in other database textbooks.Similar list in other database textbooks.

Page 8: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

The VisionThe Vision

We propose Hippocratic Databases that include We propose Hippocratic Databases that include responsibility for the privacy of data they manage responsibility for the privacy of data they manage as a founding tenet.as a founding tenet.

Page 9: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

ApproachApproach

Derive founding principles from current privacy Derive founding principles from current privacy legislation.legislation.

Strawman DesignStrawman Design Challenges & Open ProblemsChallenges & Open Problems

Page 10: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

CaveatsCaveats

Technology alone cannot address all concerns Technology alone cannot address all concerns about privacy.about privacy.– Solution has to be a mix of laws, societal norms, markets Solution has to be a mix of laws, societal norms, markets

and technology.and technology.– But by advancing technology, we can influence the overall But by advancing technology, we can influence the overall

quality of the solution.quality of the solution.

Not all the world’s data lives in database systems.Not all the world’s data lives in database systems.– Additional inducement for data to move to its right home. Additional inducement for data to move to its right home. – Hippocratic databases can serve as guide for other types Hippocratic databases can serve as guide for other types

of data repositories. of data repositories.

Page 11: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Privacy LegislationPrivacy Legislation

Fair Information Practices Act (US, 1974)Fair Information Practices Act (US, 1974) OECD Guidelines (Europe, 1980)OECD Guidelines (Europe, 1980) Canadian Standards Association’s Model Code for Canadian Standards Association’s Model Code for

Protection of Personal Information (1995)Protection of Personal Information (1995) Australian Privacy Amendment (2000)Australian Privacy Amendment (2000) Japan: proposed legislation (2003)Japan: proposed legislation (2003)

Page 12: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

The Ten PrinciplesThe Ten Principles

Collection GroupCollection Group– Purpose Specification, Consent, Limited CollectionPurpose Specification, Consent, Limited Collection

Use GroupUse Group– Limited Use, Limited Disclosure, Limited Retention, Limited Use, Limited Disclosure, Limited Retention,

AccuracyAccuracy

Security & Openness GroupSecurity & Openness Group– Safety, Openness, ComplianceSafety, Openness, Compliance

Page 13: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Collection GroupCollection Group

1.1. Purpose SpecificationPurpose Specification– For personal information stored in the database, the For personal information stored in the database, the

purposes for which the information has been collected purposes for which the information has been collected shall be associated with that information.shall be associated with that information.

2.2. ConsentConsent– The purposes associated with personal information shall The purposes associated with personal information shall

have consent of the donor of the personal information.have consent of the donor of the personal information.

3.3. Limited CollectionLimited Collection– The information collected shall be limited to the minimum The information collected shall be limited to the minimum

necessary for accomplishing the specified purposes.necessary for accomplishing the specified purposes.

Page 14: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Use GroupUse Group

4.4. Limited UseLimited Use– The database shall run only those queries that are The database shall run only those queries that are

consistent with the purposes for which the information consistent with the purposes for which the information has been collected.has been collected.

5.5. Limited DisclosureLimited Disclosure– Personal information shall not be communicated outside Personal information shall not be communicated outside

the database for purposes other than those for which the database for purposes other than those for which there is consent from the donor of the information.there is consent from the donor of the information.

Page 15: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Use Group (2)Use Group (2)

6.6. Limited RetentionLimited Retention– Personal information shall be retained only as long as Personal information shall be retained only as long as

necessary for the fulfillment of the purposes for which it necessary for the fulfillment of the purposes for which it has been collected.has been collected.

7.7. AccuracyAccuracy– Personal information stored in the database shall be Personal information stored in the database shall be

accurate and up-to-date.accurate and up-to-date.

Page 16: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Security & Openness GroupSecurity & Openness Group

8.8. SafetySafety– Personal information shall be protected by security Personal information shall be protected by security

safeguards against theft and other misappropriations.safeguards against theft and other misappropriations.

9.9. OpennessOpenness– A donor shall be able to access all information about the A donor shall be able to access all information about the

donor stored in the database.donor stored in the database.

10.10. ComplianceCompliance– A donor shall be able to verify compliance with the above A donor shall be able to verify compliance with the above

principles. Similarly, the database shall be able to principles. Similarly, the database shall be able to address a challenge concerning compliance.address a challenge concerning compliance.

Page 17: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Talk OutlineTalk Outline

MotivationMotivation Founding PrinciplesFounding Principles Strawman DesignStrawman Design New ChallengesNew Challenges

Page 18: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Strawman ArchitectureStrawman ArchitecturePrivacyPolicy

DataCollection

Queries Other

Store

Page 19: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Architecture: PolicyArchitecture: PolicyPrivacyPolicy

PrivacyMetadataCreator

StorePrivacyMetadata

For each purpose & piece of information (attribute):

• External recipients• Retention period• Authorized users

Different designs possible.

Converts privacy policy into privacy metadata tables.

LimitedDisclosure

LimitedRetention

Page 20: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Privacy Policies TablePrivacy Policies Table

Purpose Table Attribute External-recipients

Authorized-users

Retention

purchase customer name {delivery, credit-card}

{shipping, charge}

1 month

purchase customer email empty {shipping} 1 month

register customer name empty {registration} 3 years

register customer email empty {registration} 3 years

recommendations

order book empty {mining} 10 years

Page 21: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Architecture: Data CollectionArchitecture: Data CollectionData

Collection

Store

PrivacyConstraintValidator

AuditInfo

AuditTrail

PrivacyMetadata

Privacy policy compatible with user’s privacy preference?

Audit trail for compliance. Compliance

Consent

Page 22: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Architecture: Data CollectionArchitecture: Data CollectionData

Collection

Store

PrivacyConstraintValidator

DataAccuracyAnalyzer

AuditInfo

AuditTrail

PrivacyMetadata

Data cleansing, e.g., catch typos in address.

RecordAccessControl

Associate set of purposes with each record.

Purpose Specification

Accuracy

Page 23: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Architecture: QueriesArchitecture: QueriesQueries

Store

AttributeAccessControl

PrivacyMetadata

RecordAccessControl

2. Query tagged “telemarketing” cannot see credit card info.

3. Telemarketing query only sees records that include “telemarketing” in set of purposes.

Safety

LimitedUse

1. Telemarketing cannot issue query tagged “charge”.

Safety

Page 24: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Architecture: QueriesArchitecture: QueriesQueries

Store

AuditInfo

AuditTrail

QueryIntrusionDetector

AttributeAccessControl

PrivacyMetadata

RecordAccessControl

Telemarketing query that asks for all phone numbers.

• Compliance• Training data for query intrusion detector

Safety

Compliance

Page 25: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Architecture: OtherArchitecture: Other

StorePrivacyMetadata

Other

DataRetentionManager

EncryptionSupport

Delete items in accordance with privacy policy.

Additional security for sensitive data.

DataCollectionAnalyzer

Analyze queries to identify unnecessary collection, retention & authorizations.

LimitedRetention

LimitedCollection

Safety

Page 26: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Strawman ArchitectureStrawman ArchitecturePrivacyPolicy

DataCollection

Queries

PrivacyMetadataCreator

Store

PrivacyConstraintValidator

DataAccuracyAnalyzer

AuditInfo

AuditInfo

AuditTrail

QueryIntrusionDetector

AttributeAccessControl

PrivacyMetadata

Other

DataRetentionManager

RecordAccessControl

EncryptionSupport

DataCollectionAnalyzer

Page 27: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Talk OutlineTalk Outline

PrivacyPrivacy Founding PrinciplesFounding Principles Strawman DesignStrawman Design New ChallengesNew Challenges

Page 28: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

New ChallengesNew Challenges

GeneralGeneral– Language Language – EfficiencyEfficiency

UseUse– Limited CollectionLimited Collection– Limited DisclosureLimited Disclosure– Limited RetentionLimited Retention

Security and OpennessSecurity and Openness– SafetySafety– OpennessOpenness– ComplianceCompliance

Page 29: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

LanguageLanguage

Need a language for privacy policies & user preferences.Need a language for privacy policies & user preferences. P3P can be used as starting point.P3P can be used as starting point.

– Developed primarily for web shopping.Developed primarily for web shopping.– What about richer domains?What about richer domains?

How do we balance expressibility and usability?How do we balance expressibility and usability?

contact

email phone

home work

P3P recipients:P3P recipients:

– Arrange concepts in hierarchy or subsumption relationship.Arrange concepts in hierarchy or subsumption relationship. Purpose:Purpose:

OursSame

DeliveryUnrelated

Public

Page 30: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Language (2)Language (2)

How do we accommodate user negotiation models? How do we accommodate user negotiation models? – User willing to disclose information only if fairly User willing to disclose information only if fairly

compensated.compensated.– Value of privacy as coalitional game [KPR2001]Value of privacy as coalitional game [KPR2001]

Page 31: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

EfficiencyEfficiency

How do we minimize the cost of privacy checking?How do we minimize the cost of privacy checking? How do we incorporate purpose into database How do we incorporate purpose into database

design and query optimization?design and query optimization? Tradeoffs between space & running time.Tradeoffs between space & running time.

Only tag records in customer table with purpose, not all Only tag records in customer table with purpose, not all records. But now need to do a join when scanning records. But now need to do a join when scanning records in order table.records in order table.

How does the secure databases work on How does the secure databases work on decomposition of multilevel relations into single-decomposition of multilevel relations into single-level relations [JS91] apply here?level relations [JS91] apply here?

Page 32: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Limited CollectionLimited Collection

How do we identify attributes that are collected but How do we identify attributes that are collected but not used?not used?– Assets are only needed for mortgage when salary is below Assets are only needed for mortgage when salary is below

some threshold.some threshold.

What’s the needed granularity for numeric What’s the needed granularity for numeric attributes?attributes?– Queries only ask “Salary > threshold” for rent application.Queries only ask “Salary > threshold” for rent application.

How do we generate minimal queries?How do we generate minimal queries?– Redundancy may be hidden in application code.Redundancy may be hidden in application code.

Page 33: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Limited DisclosureLimited Disclosure

Can the user dynamically determine the set of Can the user dynamically determine the set of recipients?recipients?

Example: Alice wants to add EasyCredit to set of Example: Alice wants to add EasyCredit to set of recipients in EquiRate’s database.recipients in EquiRate’s database.

Digital signatures.Digital signatures.

Page 34: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Limited RetentionLimited Retention

Completely forgetting some information is non-Completely forgetting some information is non-trivial. trivial.

How do we delete a record from the logs and How do we delete a record from the logs and checkpoints, without affecting recovery?checkpoints, without affecting recovery?

How do we continue to support historical analysis How do we continue to support historical analysis and statistical queries without incurring privacy and statistical queries without incurring privacy breaches?breaches?

Page 35: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

SafetySafety

Encryption provides additional layer of security.Encryption provides additional layer of security. How do we index encrypted data?How do we index encrypted data? How do we run queries against encrypted data?How do we run queries against encrypted data? [SWP00], [HILM02][SWP00], [HILM02]

Page 36: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

OpennessOpenness

A donor shall be able to access all information A donor shall be able to access all information about the donor stored in the database.about the donor stored in the database.

How does the database check Alice is really Alice How does the database check Alice is really Alice and not somebody else?and not somebody else?– Princeton admissions office broke into Yale’s admissions Princeton admissions office broke into Yale’s admissions

using applicant’s social security number and birth date.using applicant’s social security number and birth date.

How does Alice find out what databases have How does Alice find out what databases have information about her?information about her?– Symmetrically private information retrieval [GIKM98].Symmetrically private information retrieval [GIKM98].

Page 37: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

ComplianceCompliance

Universal LoggingUniversal Logging– Can we provide each user whose data is accessed with a Can we provide each user whose data is accessed with a

log of that access, along with the query reading the data?log of that access, along with the query reading the data?– Use intermediaries who aggregate and analyze logs for Use intermediaries who aggregate and analyze logs for

many users.many users.

Tracking Privacy BreachesTracking Privacy Breaches– Insert “fingerprint” records with emails, telephone Insert “fingerprint” records with emails, telephone

numbers, and credit card numbers.numbers, and credit card numbers.– Some data may be more valuable for spammers or credit Some data may be more valuable for spammers or credit

card theft. How do we identify categories to do stratified card theft. How do we identify categories to do stratified fingerprinting rather than randomly inserting records?fingerprinting rather than randomly inserting records?

Page 38: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

SummarySummary

Database systems that take responsibility for the Database systems that take responsibility for the privacy of data they manage.privacy of data they manage.

Key privacy principlesKey privacy principles Strawman designStrawman design Technical challenges Technical challenges

Page 39: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Closing ThoughtsClosing Thoughts

““Code is law … it is all a matter of code: the Code is law … it is all a matter of code: the software and hardware that rule the internet” software and hardware that rule the internet”

-- L. Lessig-- L. Lessig We can architect cyberspace to protect values we We can architect cyberspace to protect values we

believe are fundamental, or we can architect it to believe are fundamental, or we can architect it to allow those values to disappear.allow those values to disappear.

Where does the database community want to go Where does the database community want to go from here?from here?

Page 40: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

Strawman ArchitectureStrawman ArchitecturePrivacyPolicy

DataCollection

Queries

PrivacyMetadataCreator

Store

PrivacyConstraintValidator

DataAccuracyAnalyzer

AuditInfo

AuditInfo

AuditTrail

QueryIntrusionDetector

AttributeAccessControl

PrivacyMetadata

Other

DataRetentionManager

RecordAccessControl

EncryptionSupport

DataCollectionAnalyzer

Page 41: Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.

PrivacyPrivacy

Privacy is the right of individuals to determine for Privacy is the right of individuals to determine for themselves when, how and to what extent themselves when, how and to what extent information about them is communicated to others. information about them is communicated to others.

-- Alan Westin-- Alan Westin