Overview of HDFS Transparent Encryption
-
Upload
cloudera-inc -
Category
Documents
-
view
775 -
download
2
Transcript of Overview of HDFS Transparent Encryption
1© Cloudera, Inc. All rights reserved.
Charles Lamb
HDFS Transparent EncryptionSFHUG
2© Cloudera, Inc. All rights reserved.
Overview
• Done under open source (HDFS-6134)
• Data read from and written to certain directories is transparently encrypted
• No changes to user code
• Encryption/decryption always done by client
• HDFS never handles unencrypted data or unencrypted keys
• Helps applications be regulation-compliant (HIPAA, PCI DSS, FISMA, etc.)
3© Cloudera, Inc. All rights reserved.
Background
• Encryption can happen at any of several levels:
• Application: most secure and flexible, but hardest to do
• Adding encryption to legacy applications may be difficult
• Database: most DBMSs have this, but may incur performance penalties
• Secondary indices can not generally be encrypted
• Filesystem: high performance, transparent, but may not be flexible enough
• Multi-tenancy vs per-user encryption policies
• Disk: high performance but only really protects against physical theft
• HDFS encryption is somewhere between Filesystem and Database level
4© Cloudera, Inc. All rights reserved.
Design Goals
• Performance and scalability
• Transparent to applications, including legacy apps
• End-to-end
• Data should be encrypted on the network and ‘at-rest’
• Compartmentalization
• Key management independent of HDFS management
• Includes preventing HDFS admins and root users from accessing sensitive data
• Compatibility with HDFS access methods: WebHDFS, HttpFS, FUSE, NFS, hftp, har, etc.
5© Cloudera, Inc. All rights reserved.
Architectural Concepts
• Key Management Server
• Encryption Zones
• Keys
6© Cloudera, Inc. All rights reserved.
Key Management Server
7© Cloudera, Inc. All rights reserved.
Key Management Server (KMS)
• KMS sits between client and key server
• E.g. Cloudera Navigator Key Trustee
• Provides a unified API and scalability
• REST API
• Does not actually store keys (backend does that), but does cache them
• ACLs on per-key basis
8© Cloudera, Inc. All rights reserved.
Encryption Zones
• An HDFS directory in which the contents (including subdirs) are encrypted on write and decrypted on read.
• An EZ begins life as an empty directory
• Renames in/out of an EZ are prohibited
• Encryption is transparent to application with no code changes
9© Cloudera, Inc. All rights reserved.
Keys
• Every Encryption Zone has a key (“EZ Key”)
• Every file in an Encryption Zone has a unique key (“Data Encryption Key” or “DEK”)
• The HDFS NameNode stores the name of the EZ Key in an Xattr of the EZ Dir
• The actual EZ Key is stored in the Key Server
• The NameNode stores the DEK in an Xattr of the file, but only in encrypted form
• Encrypted Data Encryption Key, or “EDEK”
• The NameNode never touches decrypted data or decrypted keys
10© Cloudera, Inc. All rights reserved.
EZ Keys, Data Encryption Keys, and Encrypted Data Encryption Keys
11© Cloudera, Inc. All rights reserved.
Key Handling
12© Cloudera, Inc. All rights reserved.
Design
• End-to-end encryption
• Encryption occurs on the client and decrypted data is never touched by HDFS
• Protects against network sniffing, evil HDFS admins, and hard drive theft
• HDFS never touches key material (DEK’s or EZ keys)
• Compromising an HDFS daemon is not a viable attack vector
• HDFS handles encrypted Keys (EDEKs), but never in decrypted form (DEKs)
• Key permissions are handled by the KMS ACLs
• Each file is encrypted with a unique DEK
13© Cloudera, Inc. All rights reserved.
HDFS Encryption Configuration
• hadoop key create <keyname>
• hdfs dfs –mkdir <path>
• hdfs crypto –createZone –keyName <keyname> -path <path>
14© Cloudera, Inc. All rights reserved.
KMS Per-User ACL Configuration
• White lists (check for inclusion) and black lists (check for exclusion)
• etc/hadoop/kms-acls.xml
• hadoop.kms.acl.CREATE
• hadoop.kms.blacklist.CREATE
• … DELETE, ROLLOVER, GET, GET_KEYS, GET_METADATA,
GENERATE_EEK, DECRYPT_EEK
15© Cloudera, Inc. All rights reserved.
KMS Per-Key ACL Configuration
• etc/hadoop/kms-acls.xml
• hadoop.kms.acl.<keyname>.<operation>
• MANAGEMENT – createKey, deleteKey, rolloverNewVersion
• GENERATE_EEK – generateEncryptedKey,
warmUpEncryptedKeys
• DECRYPT_EEK – decryptEncryptedKey
• READ – getKeyVersion, getKeyVersions, getMetadata,
getKeysMetadata, getCurrentKey
• ALL – all of the above
16© Cloudera, Inc. All rights reserved.
Performance
• AES-CTR, 128 or 256 (with unlimited strength JCE installed)
• AES-NI available
• Negligible overhead on writes and 7.5% impact on reads for datasets larger than memory
17© Cloudera, Inc. All rights reserved.
DistCp
• Encryption Zone to Encryption Zone
• use –update –skipcrccheck
• Admins use special /.reserved/raw path prefix
• /.reserved/raw is only available to root and provides the encrypted contents
18© Cloudera, Inc. All rights reserved.
Exceptions
• Hive: may not be able to do a query that combines data from more than one encryption zone
19© Cloudera, Inc. All rights reserved.
HDFS Encryption - Summary
• Good performance (4-10% hit)
• No mods to existing applications
• Prevents attacks at the filesystem and below
• OS and filesystem only see encrypted bytes
• Data is encrypted all the way to the client
• Secure ‘at rest’ and in transit
• Key management is independent of HDFS
• Key admin != HDFS admin
• Can prevent HDFS admin from accessing secure data
20© Cloudera, Inc. All rights reserved.
Questions