Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.
-
Upload
wilfrid-stewart -
Category
Documents
-
view
215 -
download
0
Transcript of Hippocratic Databases Rakesh Agrawal Jerry Kiernan Ramakrishnan Srikant Yirong Xu.
Hippocratic DatabasesHippocratic Databases
Rakesh AgrawalRakesh AgrawalJerry KiernanJerry KiernanRamakrishnan SrikantRamakrishnan SrikantYirong XuYirong Xu
AlgorithmsAlgorithms
Performance GraphsPerformance Graphs
Founding PrinciplesFounding Principles
New ChallengesNew Challenges
Vision PaperVision Paper
The Hippocratic OathThe Hippocratic Oath
“ “What I may see or hear in the course of treatment What I may see or hear in the course of treatment or even outside of the treatment in regard to the life or even outside of the treatment in regard to the life of men, which on no account [ought to be] spread of men, which on no account [ought to be] spread abroad, I will keep to myself, holding such things abroad, I will keep to myself, holding such things shameful to be spoken about.” shameful to be spoken about.”
– – Hippocratic Oath, 8 (circa 400 BC)Hippocratic Oath, 8 (circa 400 BC)
Privacy ViolationsPrivacy Violations
Accidents:Accidents:– Kaiser, GlobalHealthraxKaiser, GlobalHealthrax
Lax security:Lax security:– Massachusetts govt.Massachusetts govt.
Ethically questionable behavior: Ethically questionable behavior: – Lotus & Equifax, Lexis-Nexis, Medical Marketing Service, Lotus & Equifax, Lexis-Nexis, Medical Marketing Service,
Boston University, CVS & Giant FoodBoston University, CVS & Giant Food
Illegal:Illegal:– ToysmartToysmart
Growing Privacy ConcernsGrowing Privacy Concerns
Popular Press:Popular Press:– Economist: The End of Privacy (May 99)Economist: The End of Privacy (May 99)– Time: The Death of Privacy (Aug 97)Time: The Death of Privacy (Aug 97)
Govt. legislationGovt. legislation S. Garfinkel, "Database Nation: The Death of S. Garfinkel, "Database Nation: The Death of
Privacy in 21st Century", O' Reilly, Jan 2000Privacy in 21st Century", O' Reilly, Jan 2000 Special issue on internet privacy, CACM, Feb 99Special issue on internet privacy, CACM, Feb 99
Related WorkRelated Work
Statistical DatabasesStatistical Databases– Provide statistical information (sum, count, etc.) without Provide statistical information (sum, count, etc.) without
compromising sensitive information about individuals. compromising sensitive information about individuals. [AW89][AW89]
Multilevel Secure DatabasesMultilevel Secure Databases– Multilevel relations, e.g., records tagged “secret”, Multilevel relations, e.g., records tagged “secret”,
“confidential”, or “unclassified”, e.g. [JS91]“confidential”, or “unclassified”, e.g. [JS91]
Wish to protect privacy in transactional databases Wish to protect privacy in transactional databases that support daily operations.that support daily operations.– Cannot restrict queries to statistical queries.Cannot restrict queries to statistical queries.– Cannot tag all the records “top secret”.Cannot tag all the records “top secret”.
Current Database SystemsCurrent Database Systems
Ullman, “Principles of Database and Ullman, “Principles of Database and Knowledgebase Systems”Knowledgebase Systems”
Fundamental:Fundamental:– Manage persistent data.Manage persistent data.– Access a large amount of data efficiently.Access a large amount of data efficiently.
Desirable:Desirable:– Support for data model, high-level languages, transaction Support for data model, high-level languages, transaction
management, management, access controlaccess control, and resiliency., and resiliency.
Similar list in other database textbooks.Similar list in other database textbooks.
The VisionThe Vision
We propose Hippocratic Databases that include We propose Hippocratic Databases that include responsibility for the privacy of data they manage responsibility for the privacy of data they manage as a founding tenet.as a founding tenet.
ApproachApproach
Derive founding principles from current privacy Derive founding principles from current privacy legislation.legislation.
Strawman DesignStrawman Design Challenges & Open ProblemsChallenges & Open Problems
CaveatsCaveats
Technology alone cannot address all concerns Technology alone cannot address all concerns about privacy.about privacy.– Solution has to be a mix of laws, societal norms, markets Solution has to be a mix of laws, societal norms, markets
and technology.and technology.– But by advancing technology, we can influence the overall But by advancing technology, we can influence the overall
quality of the solution.quality of the solution.
Not all the world’s data lives in database systems.Not all the world’s data lives in database systems.– Additional inducement for data to move to its right home. Additional inducement for data to move to its right home. – Hippocratic databases can serve as guide for other types Hippocratic databases can serve as guide for other types
of data repositories. of data repositories.
Privacy LegislationPrivacy Legislation
Fair Information Practices Act (US, 1974)Fair Information Practices Act (US, 1974) OECD Guidelines (Europe, 1980)OECD Guidelines (Europe, 1980) Canadian Standards Association’s Model Code for Canadian Standards Association’s Model Code for
Protection of Personal Information (1995)Protection of Personal Information (1995) Australian Privacy Amendment (2000)Australian Privacy Amendment (2000) Japan: proposed legislation (2003)Japan: proposed legislation (2003)
The Ten PrinciplesThe Ten Principles
Collection GroupCollection Group– Purpose Specification, Consent, Limited CollectionPurpose Specification, Consent, Limited Collection
Use GroupUse Group– Limited Use, Limited Disclosure, Limited Retention, Limited Use, Limited Disclosure, Limited Retention,
AccuracyAccuracy
Security & Openness GroupSecurity & Openness Group– Safety, Openness, ComplianceSafety, Openness, Compliance
Collection GroupCollection Group
1.1. Purpose SpecificationPurpose Specification– For personal information stored in the database, the For personal information stored in the database, the
purposes for which the information has been collected purposes for which the information has been collected shall be associated with that information.shall be associated with that information.
2.2. ConsentConsent– The purposes associated with personal information shall The purposes associated with personal information shall
have consent of the donor of the personal information.have consent of the donor of the personal information.
3.3. Limited CollectionLimited Collection– The information collected shall be limited to the minimum The information collected shall be limited to the minimum
necessary for accomplishing the specified purposes.necessary for accomplishing the specified purposes.
Use GroupUse Group
4.4. Limited UseLimited Use– The database shall run only those queries that are The database shall run only those queries that are
consistent with the purposes for which the information consistent with the purposes for which the information has been collected.has been collected.
5.5. Limited DisclosureLimited Disclosure– Personal information shall not be communicated outside Personal information shall not be communicated outside
the database for purposes other than those for which the database for purposes other than those for which there is consent from the donor of the information.there is consent from the donor of the information.
Use Group (2)Use Group (2)
6.6. Limited RetentionLimited Retention– Personal information shall be retained only as long as Personal information shall be retained only as long as
necessary for the fulfillment of the purposes for which it necessary for the fulfillment of the purposes for which it has been collected.has been collected.
7.7. AccuracyAccuracy– Personal information stored in the database shall be Personal information stored in the database shall be
accurate and up-to-date.accurate and up-to-date.
Security & Openness GroupSecurity & Openness Group
8.8. SafetySafety– Personal information shall be protected by security Personal information shall be protected by security
safeguards against theft and other misappropriations.safeguards against theft and other misappropriations.
9.9. OpennessOpenness– A donor shall be able to access all information about the A donor shall be able to access all information about the
donor stored in the database.donor stored in the database.
10.10. ComplianceCompliance– A donor shall be able to verify compliance with the above A donor shall be able to verify compliance with the above
principles. Similarly, the database shall be able to principles. Similarly, the database shall be able to address a challenge concerning compliance.address a challenge concerning compliance.
Talk OutlineTalk Outline
MotivationMotivation Founding PrinciplesFounding Principles Strawman DesignStrawman Design New ChallengesNew Challenges
Strawman ArchitectureStrawman ArchitecturePrivacyPolicy
DataCollection
Queries Other
Store
Architecture: PolicyArchitecture: PolicyPrivacyPolicy
PrivacyMetadataCreator
StorePrivacyMetadata
For each purpose & piece of information (attribute):
• External recipients• Retention period• Authorized users
Different designs possible.
Converts privacy policy into privacy metadata tables.
LimitedDisclosure
LimitedRetention
Privacy Policies TablePrivacy Policies Table
Purpose Table Attribute External-recipients
Authorized-users
Retention
purchase customer name {delivery, credit-card}
{shipping, charge}
1 month
purchase customer email empty {shipping} 1 month
register customer name empty {registration} 3 years
register customer email empty {registration} 3 years
recommendations
order book empty {mining} 10 years
Architecture: Data CollectionArchitecture: Data CollectionData
Collection
Store
PrivacyConstraintValidator
AuditInfo
AuditTrail
PrivacyMetadata
Privacy policy compatible with user’s privacy preference?
Audit trail for compliance. Compliance
Consent
Architecture: Data CollectionArchitecture: Data CollectionData
Collection
Store
PrivacyConstraintValidator
DataAccuracyAnalyzer
AuditInfo
AuditTrail
PrivacyMetadata
Data cleansing, e.g., catch typos in address.
RecordAccessControl
Associate set of purposes with each record.
Purpose Specification
Accuracy
Architecture: QueriesArchitecture: QueriesQueries
Store
AttributeAccessControl
PrivacyMetadata
RecordAccessControl
2. Query tagged “telemarketing” cannot see credit card info.
3. Telemarketing query only sees records that include “telemarketing” in set of purposes.
Safety
LimitedUse
1. Telemarketing cannot issue query tagged “charge”.
Safety
Architecture: QueriesArchitecture: QueriesQueries
Store
AuditInfo
AuditTrail
QueryIntrusionDetector
AttributeAccessControl
PrivacyMetadata
RecordAccessControl
Telemarketing query that asks for all phone numbers.
• Compliance• Training data for query intrusion detector
Safety
Compliance
Architecture: OtherArchitecture: Other
StorePrivacyMetadata
Other
DataRetentionManager
EncryptionSupport
Delete items in accordance with privacy policy.
Additional security for sensitive data.
DataCollectionAnalyzer
Analyze queries to identify unnecessary collection, retention & authorizations.
LimitedRetention
LimitedCollection
Safety
Strawman ArchitectureStrawman ArchitecturePrivacyPolicy
DataCollection
Queries
PrivacyMetadataCreator
Store
PrivacyConstraintValidator
DataAccuracyAnalyzer
AuditInfo
AuditInfo
AuditTrail
QueryIntrusionDetector
AttributeAccessControl
PrivacyMetadata
Other
DataRetentionManager
RecordAccessControl
EncryptionSupport
DataCollectionAnalyzer
Talk OutlineTalk Outline
PrivacyPrivacy Founding PrinciplesFounding Principles Strawman DesignStrawman Design New ChallengesNew Challenges
New ChallengesNew Challenges
GeneralGeneral– Language Language – EfficiencyEfficiency
UseUse– Limited CollectionLimited Collection– Limited DisclosureLimited Disclosure– Limited RetentionLimited Retention
Security and OpennessSecurity and Openness– SafetySafety– OpennessOpenness– ComplianceCompliance
LanguageLanguage
Need a language for privacy policies & user preferences.Need a language for privacy policies & user preferences. P3P can be used as starting point.P3P can be used as starting point.
– Developed primarily for web shopping.Developed primarily for web shopping.– What about richer domains?What about richer domains?
How do we balance expressibility and usability?How do we balance expressibility and usability?
contact
email phone
home work
P3P recipients:P3P recipients:
– Arrange concepts in hierarchy or subsumption relationship.Arrange concepts in hierarchy or subsumption relationship. Purpose:Purpose:
OursSame
DeliveryUnrelated
Public
Language (2)Language (2)
How do we accommodate user negotiation models? How do we accommodate user negotiation models? – User willing to disclose information only if fairly User willing to disclose information only if fairly
compensated.compensated.– Value of privacy as coalitional game [KPR2001]Value of privacy as coalitional game [KPR2001]
EfficiencyEfficiency
How do we minimize the cost of privacy checking?How do we minimize the cost of privacy checking? How do we incorporate purpose into database How do we incorporate purpose into database
design and query optimization?design and query optimization? Tradeoffs between space & running time.Tradeoffs between space & running time.
Only tag records in customer table with purpose, not all Only tag records in customer table with purpose, not all records. But now need to do a join when scanning records. But now need to do a join when scanning records in order table.records in order table.
How does the secure databases work on How does the secure databases work on decomposition of multilevel relations into single-decomposition of multilevel relations into single-level relations [JS91] apply here?level relations [JS91] apply here?
Limited CollectionLimited Collection
How do we identify attributes that are collected but How do we identify attributes that are collected but not used?not used?– Assets are only needed for mortgage when salary is below Assets are only needed for mortgage when salary is below
some threshold.some threshold.
What’s the needed granularity for numeric What’s the needed granularity for numeric attributes?attributes?– Queries only ask “Salary > threshold” for rent application.Queries only ask “Salary > threshold” for rent application.
How do we generate minimal queries?How do we generate minimal queries?– Redundancy may be hidden in application code.Redundancy may be hidden in application code.
Limited DisclosureLimited Disclosure
Can the user dynamically determine the set of Can the user dynamically determine the set of recipients?recipients?
Example: Alice wants to add EasyCredit to set of Example: Alice wants to add EasyCredit to set of recipients in EquiRate’s database.recipients in EquiRate’s database.
Digital signatures.Digital signatures.
Limited RetentionLimited Retention
Completely forgetting some information is non-Completely forgetting some information is non-trivial. trivial.
How do we delete a record from the logs and How do we delete a record from the logs and checkpoints, without affecting recovery?checkpoints, without affecting recovery?
How do we continue to support historical analysis How do we continue to support historical analysis and statistical queries without incurring privacy and statistical queries without incurring privacy breaches?breaches?
SafetySafety
Encryption provides additional layer of security.Encryption provides additional layer of security. How do we index encrypted data?How do we index encrypted data? How do we run queries against encrypted data?How do we run queries against encrypted data? [SWP00], [HILM02][SWP00], [HILM02]
OpennessOpenness
A donor shall be able to access all information A donor shall be able to access all information about the donor stored in the database.about the donor stored in the database.
How does the database check Alice is really Alice How does the database check Alice is really Alice and not somebody else?and not somebody else?– Princeton admissions office broke into Yale’s admissions Princeton admissions office broke into Yale’s admissions
using applicant’s social security number and birth date.using applicant’s social security number and birth date.
How does Alice find out what databases have How does Alice find out what databases have information about her?information about her?– Symmetrically private information retrieval [GIKM98].Symmetrically private information retrieval [GIKM98].
ComplianceCompliance
Universal LoggingUniversal Logging– Can we provide each user whose data is accessed with a Can we provide each user whose data is accessed with a
log of that access, along with the query reading the data?log of that access, along with the query reading the data?– Use intermediaries who aggregate and analyze logs for Use intermediaries who aggregate and analyze logs for
many users.many users.
Tracking Privacy BreachesTracking Privacy Breaches– Insert “fingerprint” records with emails, telephone Insert “fingerprint” records with emails, telephone
numbers, and credit card numbers.numbers, and credit card numbers.– Some data may be more valuable for spammers or credit Some data may be more valuable for spammers or credit
card theft. How do we identify categories to do stratified card theft. How do we identify categories to do stratified fingerprinting rather than randomly inserting records?fingerprinting rather than randomly inserting records?
SummarySummary
Database systems that take responsibility for the Database systems that take responsibility for the privacy of data they manage.privacy of data they manage.
Key privacy principlesKey privacy principles Strawman designStrawman design Technical challenges Technical challenges
Closing ThoughtsClosing Thoughts
““Code is law … it is all a matter of code: the Code is law … it is all a matter of code: the software and hardware that rule the internet” software and hardware that rule the internet”
-- L. Lessig-- L. Lessig We can architect cyberspace to protect values we We can architect cyberspace to protect values we
believe are fundamental, or we can architect it to believe are fundamental, or we can architect it to allow those values to disappear.allow those values to disappear.
Where does the database community want to go Where does the database community want to go from here?from here?
Strawman ArchitectureStrawman ArchitecturePrivacyPolicy
DataCollection
Queries
PrivacyMetadataCreator
Store
PrivacyConstraintValidator
DataAccuracyAnalyzer
AuditInfo
AuditInfo
AuditTrail
QueryIntrusionDetector
AttributeAccessControl
PrivacyMetadata
Other
DataRetentionManager
RecordAccessControl
EncryptionSupport
DataCollectionAnalyzer
PrivacyPrivacy
Privacy is the right of individuals to determine for Privacy is the right of individuals to determine for themselves when, how and to what extent themselves when, how and to what extent information about them is communicated to others. information about them is communicated to others.
-- Alan Westin-- Alan Westin