Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early...
Transcript of Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early...
![Page 1: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/1.jpg)
1© 2018 All rights reserved.
Distributed Database Architecture for GDPR
Karthik RanganathanPostgresConf Silicon Valley
Oct 15, 2018
![Page 2: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/2.jpg)
2© 2018 All rights reserved.
About Us
Kannan Muthukkaruppan, CEONutanix ♦ Facebook ♦ Oracle
IIT-Madras, University of California-Berkeley
Karthik Ranganathan, CTONutanix ♦ Facebook ♦Microsoft
IIT-Madras, University of Texas-Austin
Mikhail Bautin, Software ArchitectClearStory Data ♦ Facebook ♦ D.E.Shaw
Nizhny Novgorod State University, Stony Brook
ü Founded Feb 2016
ü Apache HBase committers and early engineers on Apache Cassandra
ü Built Facebook’s NoSQL platform powered by Apache HBase
ü Scaled the platform to serve many mission-critical use cases• Facebook Messages (Messenger)• Operational Data Store (Time series Data)
ü Reassembled the same Facebook team at YugaByte along with engineers from Oracle, Google, Nutanix and LinkedIn
Founders
![Page 3: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/3.jpg)
3© 2018 All rights reserved.
WHAT ISYUGABYTE DB?
![Page 4: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/4.jpg)
4© 2018 All rights reserved.
A transactional, planet-scale database
for building high-performance cloud services.
![Page 5: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/5.jpg)
5© 2018 All rights reserved.
NoSQL + SQL Cloud Native
![Page 6: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/6.jpg)
6© 2018 All rights reserved.
TRANSACTIONAL PLANET-SCALEHIGH PERFORMANCE
Single Shard & Distributed ACID Txns
Document-Based, Strongly Consistent Storage
Low Latency, Tunable Reads
High Throughput
OPEN SOURCE
Apache 2.0
Popular APIs ExtendedApache Cassandra, Redis and PostgreSQL (BETA)
Auto Sharding & Rebalancing
Global Data Distribution
Design Principles
CLOUD NATIVE
Built For The Container Era
Self-Healing, Fault-Tolerant
![Page 7: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/7.jpg)
7© 2018 All rights reserved.
WHAT IS GDPR?
![Page 8: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/8.jpg)
8© 2018 All rights reserved.
GDPR : General Data Protection Regulation
![Page 9: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/9.jpg)
9© 2018 All rights reserved.
Citizens of EU can control sharing and protection
of their personal data by businesses.
![Page 10: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/10.jpg)
10© 2018 All rights reserved.
Personal Data, also called
PII (Personally Identifiable Information)
• User name
• Email address
• Date of birth
• Bank details
• Location details
• Computer IP address
![Page 11: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/11.jpg)
11© 2018 All rights reserved.
Control over personal data
• Consent & data location
• Data privacy and safety
• Right to be forgotten
• Data access on demand
• Notify on data breach
• Data portability
• Ability to fix errors in data
• Restrict processing
Database concerns Application concerns
![Page 12: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/12.jpg)
12© 2018 All rights reserved.
#1 USER CONSENTAND DATA LOCATION
![Page 13: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/13.jpg)
13© 2018 All rights reserved.
Data must be stored in EU by default. Businesses
need explicit user consent to move it outside.
![Page 14: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/14.jpg)
14© 2018 All rights reserved.
Why is this hard?
• EU user data lives in that region
• Other countries have compliance regulation – more geo’s
• Public clouds may not have coverage – hybrid deployments
• Architecture depends on data – multiple per service
Think Global Deployments first!
![Page 15: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/15.jpg)
15© 2018 All rights reserved.
Example – online ecommerce site
• Products table needs globally replication – not PII data
![Page 16: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/16.jpg)
16© 2018 All rights reserved.
Read Replicas
Global Replication
Non-PII Data
Global Replication with YugaByte DB
![Page 17: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/17.jpg)
17© 2018 All rights reserved.
Example – online ecommerce site
• Users, orders and shipments needs locality – PII data
• Product locations table needs scale – may be PII
![Page 18: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/18.jpg)
18© 2018 All rights reserved.
Primary Data in EU
PII Data
Non-EU Data
Non-EU DataGeo-Partitioning
with YugaByte DB
![Page 19: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/19.jpg)
19© 2018 All rights reserved.
Replicate data on demand to other geo’s
• User may be ok with replicating data
• Read replicas on demand (for remote, low-latency reads)
• Change data capture (for analytics)
![Page 20: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/20.jpg)
20© 2018 All rights reserved.
Read Replicas
Primary Data in EU
PII Data with YugaByte DB
Read Replicas with YugaByte DB
![Page 21: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/21.jpg)
21© 2018 All rights reserved.
#2 DATA PRIVACYAND SAFETY
![Page 22: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/22.jpg)
22© 2018 All rights reserved.
Data must be secured by using best practices by
default. Users need to be notified on breach.
![Page 23: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/23.jpg)
23© 2018 All rights reserved.
Implement end-to-end encryption on day #1
![Page 24: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/24.jpg)
24© 2018 All rights reserved.
• Use TLS Encryption
• Between client and server for app interaction
• Between database servers for replication
Encrypt All Network Communication
![Page 25: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/25.jpg)
25© 2018 All rights reserved.
TLS Encryption
Database Cluster
User
Server to server communication
![Page 26: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/26.jpg)
26© 2018 All rights reserved.
• Encryption at rest
• Integrate with external Key Management Systems
• Ability to rotate keys on demand
Encryption All Storage
Have a key-value table with id to cipher key. Encrypt PII data with
the cipher key for fine-grained control. More in the next section.
![Page 27: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/27.jpg)
27© 2018 All rights reserved.
Encryption at Rest
Database Cluster
User
Encryption on disk
Key Management Service
![Page 28: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/28.jpg)
28© 2018 All rights reserved.
#3 RIGHT TO BE FORGOTTEN
![Page 29: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/29.jpg)
29© 2018 All rights reserved.
Data must be erased if on explicit request or when
data is no longer relevant to original intent.
![Page 30: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/30.jpg)
30© 2018 All rights reserved.
• Have a key-value table with id to cipher key
• Encrypt PII data with the cipher key on write
• Decrypt PII data on access
• Delete cipher key to forget PII data
Use Encryption of Data Attributes
![Page 31: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/31.jpg)
31© 2018 All rights reserved.
SET [email protected] FOR USER ID=XXX
Example - Storing User Profile Data
SET email=ENCRYPTED FOR USER ID=XXX
Get encryption key for user
Encryption PII DataStore encrypted data
• Reads require decryption• Data not accessible without key
![Page 32: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/32.jpg)
32© 2018 All rights reserved.
• Many cases where value not needed
• Anonymize PII data with one way hash functions
• Use hashed ids for in data warehouse
• There is no PII data if hashed ids are used!
Use Anonymization of Data Attributes
![Page 33: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/33.jpg)
33© 2018 All rights reserved.
[email protected] CHECKED OUT PRODUCT=X, CATEGORY=Gadget
Example – Website Analytics
USER=HASHED_VAL CHECKED OUT PRODUCT=X, CATEGORY=Gadget
One-way hash user id
Analytics
![Page 34: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/34.jpg)
34© 2018 All rights reserved.
Example – Website Analytics
• User no longer identifiable• Hashed data still useful!
![Page 35: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/35.jpg)
35© 2018 All rights reserved.
#4 DATA ACCESSON DEMAND
![Page 36: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/36.jpg)
36© 2018 All rights reserved.
Ability to inform a user about what data is being
used, for what purpose and where it is stored.
![Page 37: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/37.jpg)
37© 2018 All rights reserved.
• Store in a separate information architecture table
• Make tagging a part of the process
• Easy to find what PII data is stored on demand
Tag Tables and Columns with PII
![Page 38: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/38.jpg)
38© 2018 All rights reserved.
• Ensure PII are encrypted
• Ensure non-PII columns do not have sensitive data
• Use Spark/Presto to perform scan periodically
• Run scan on a read replica to not impact production
Run Continuous Compliance Checks
![Page 39: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/39.jpg)
39© 2018 All rights reserved.
Ensure PII columns are encrypted
Ensure no PII data in other columns
Tag PII Columns
![Page 40: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/40.jpg)
40© 2018 All rights reserved.
PUTTING IT ALL TOGETHER
![Page 41: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/41.jpg)
41© 2018 All rights reserved.
GDPR Reference Architecture
Primary Cluster(in EU)
Read Replica Clusters(Anywhere in the World)
Encrypted Encrypted
App clients
Encrypted Async Replication
Reads & Writes, Encrypted
Analytics clients
Read only, Encrypted
At-Rest Encryption for All Nodes At-Rest Encryption for All Nodes
PII Columns Encrypted w/ Cipher Key
Tag PII Columns
Ensure PII columns are encrypted
Ensure no PII data in other columns
![Page 42: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/42.jpg)
42© 2018 All rights reserved.
![Page 43: Distributed Database Architecture for GDPR · 10/15/2018 · ü Apache HBase committers and early engineers on Apache Cassandra ü Built Facebook’s NoSQL platform powered by Apache](https://reader034.fdocuments.in/reader034/viewer/2022042309/5ed6e4dbdf0eda5e752aea5f/html5/thumbnails/43.jpg)
43© 2018 All rights reserved.
Questions?Try it at docs.yugabyte.com/quick-start