1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Owen O’Malley – Co-founder & Technical Fellow Srikanth Venkat – Senior Director, Product Management
Treat Your Enterprise Data Lake Indigestion: Enterprise Ready Security And Governance For Hadoop Ecosystem
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Presenters
Owen O’Malley
Co-Founder & Technical Fellow
Hortonworks
Srikanth Venkat
Senior Director of Product Management, Security & Governance
Apache Ranger, Apache Atlas, Apache Knox
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaHDP Security
Authentication (Kerberos, Apache Knox)
Authorization & Audits (Apache Ranger)
Data Protection
HDP Governance: Apache Atlas Overview
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDP Security: Comprehensive, Complete, Extensible
Data Protection
Protect data at rest and in motion
Audit
Maintain a record of data access
Authorization
Provision access to data
Authentication
Authenticate users and systems
Administration
Central management and consistent securitySingle administrative console to set policy across the entire cluster: Apache Ranger
Authentication for perimeter and cluster; integrates with existing Active Directory and LDAP solutions: Kerberos | Apache Knox
Consistent authorization controls across all Apache components within HDP: Apache Ranger
Record of data access events across all components that is consistent and accessible: Apache Ranger
Secure data in motion and data at rest: HDFS TDE w/ Ranger KMS + HSM, Ranger Data Masking + Row Filtering, Wire encryption + Partner Solutions
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication & API Security: Apache Knox
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Knox Community Snapshot
Mar 2013
Entered
Incubator
Oct 2013
0.1.0 - 0.3.0
Incubator
Releases
Feb 2014
Graduates
to
Apache TLP
Apr 2014
0.4.0
TLP
Release
Nov 2014
0.5.0 May 2015
0.6.0Apr/Aug 2016
0.9.0/0.9.1
Feb 2016
0.8.0Dec 2015
0.7.0
Nov 2016
0.10.0Dec 2016
0.11.0
Mar 2017
0.12.0TBD
1.0.0
Target
Release
Date
• Committers: 17
• Contributors from:• Hortonworks, IBM, CGI,
Uber, Oracle, Blue Talon
Apache 0.12.0/HDP 2.6
• Client SDK/DSL Improvements
• Apache Zeppelin Proxying
• YARN RM UI HA Support
• Knox Token Service
• Solr API and UI
Apache 0.11.0
• LDAP Improvements
• Hadoop Group Lookup Support
• Phoenix Server Support (Avatica)
• Management UI
• Metrics
@apache_knox
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Knox Proxying Services
★ Provide access to Hadoop via proxying of HTTP resources
★ Ecosystem APIs and UIs + Hadoop oriented dispatching for Kerberos + doAs(impersonation) etc.
Authentication Services
★ REST API access, WebSSO flow for UIs
★ LDAP/AD, Header based PreAuth
★ Kerberos, SAML, OAuth
Client DSL/SDK Services
★ Scripting through DSL
★ Using Knox Shell classes directly as SDK
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication: Kerberos
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Background: Kerberos
⬢ Strongly authenticating and establishing a user’s identity is the basis for secure access in Hadoop
⬢ Users need to be able to reliably “identify” themselves and have identity propagated throughout the Hadoop cluster
⬢ Design & implementation of Kerberos security in native Apache Hadoop was delivered by Hortonworks co-founder Owen O’Malley!
⬢ Why Kerberos?
⬢ Establishes identity for clients, hosts and services
⬢ Prevents impersonation/passwords are never sent over the wire
⬢ Integrates w/ enterprise identity mgmt tools such as LDAP &Active Directory
⬢ More granular auditing of data access/job execution
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Automated Kerberos Setup with Ambari
Wizard driven and automated Kerberos support (kerberos principal creation for service accounts, keytab generation and distribution for appropriate hosts, permissions, etc.)
Removes cumbersome, time consuming and error prone administration of Kerberos
Works with existing Kerberos infrastructure, including Active Directory to automate common tasks, removing the burden from the operator:
• Add/Delete Host
• Add Service
• Add/Delete Component
• Regenerate Keytabs
• Disable Kerberos
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kerberos + Active Directory
Page 18
Cross Realm Trust
Client
Hadoop Cluster
AD / LDAP KDC
Users: [email protected]
Hosts: [email protected]
Services: hdfs/[email protected]
User Store
Use existing directory tools to manage users
Use Kerberos tools to manage host + service
principals
Authentication
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authorization & Audits: Apache Ranger
20 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hortonworks Confidential. For Internal Use Only.
Apache Ranger Community Snapshot
May 2014
XASecureAcquisition
July 2014
Enters Apache Incubation
Nov 2014
Ranger 0.4.0
Release
July 2015
Ranger 0.5/ HDP2.3
Aug 2016
Ranger 0.6/ HDP2.5
Nov 2016
Ranger 0.6.2/ HDP2.5.3
Jan 2017
Ranger TLP graduation!
Apr 2017
Ranger 0.7
/HDP2.6
TBD
1.0.0
Target
Release
Date
• Committers: 22
• Contributors from:Ebay, MSFT, Huawei, Pandora, Accenture, ING, Talend
Ranger 0.7/HDP 2.6
• Export/import of Policies
• $User and macros
• Plugin status tab
• “Show columns” and “describe extended support”
• Incremental LDAP Sync
• SmartSense Metrics
Ranger 0.6/HDP2.5
• Classification (tag) based security (ABAC)
• Dynamic Column Masking & Row Filtering
• KMS HSM Integration (Safenet)
• Dynamic Policies & Deny Conditions
• LDAP Improvements & Audit Scalability
Jun 2017
Ranger 0.7.1/ HDP2.6.1
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ranger
• Central audit location for all access requests
• Support multiple destination sources (HDFS, Solr, etc.)
• Real-time visual query interface
AuditingAuthorization
• Store and manage encryption keys• Support HDFS Transparent Data
Encryption• Integration with HSM
• Safenet LUNA
Ranger KMS
• Centralized platform to define, administer and manage security policies consistently across Hadoop components
• HDFS, Hive, HBase, YARN, Kafka, Solr, Storm, Knox, NiFi, Atlas
• Extensible Architecture
• Custom policy conditions, user context enrichers
• Easy to add new component types for authorization
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger – Attribute Based Access Control (ABAC) Model
⬢ ABAC Model⬢ Combination of the subject, action,
resource, and environment ⬢ Uses descriptive attributes: AD group,
Apache Atlas-based tags or classifications, geo-location, etc.
⬢ Ranger approach is consistent with NIST 800-162
⬢ Avoid role proliferation and manageability issues
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger Architecture
HDFS
Ranger Administration Portal
HBase
Hive Server2
Ranger Audit Server
Ranger Plugin
Hadoop C
om
ponents
Ente
rprise
Users
Ranger Plugin
Ranger Plugin
Legacy Tools and Data Governance
HDFS
Knox
NifI
Ranger Plugin
Ranger Plugin
SolrRanger Plugin
Ranger Policy ServerIntegration API
KafkaRanger Plugin
YARNRanger Plugin
Ranger PluginStorm Ranger Plugin Atlas
Solr
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDP – Security & Governance
Classification
Prohibition
Time
Location
Policies
PDPResource
Cache
Ranger
Manage Access Policies and Audit Logs
Track Metadataand Lineage
Atlas ClientSubscribers
to Topic
Gets MetadataUpdates
Atlas
MetastoreTags
Assets
Entitles
Streams
Pipelines
Feeds
HiveTables
HDFSFiles
HBaseTables
Entitiesin Data
Lake
Industry First: Dynamic Tag-based Security Policies
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Row Filtering & Column Masking: Apache Ranger with Apache Hive
User 2: IvannaLocation : EU
Group: HRUser 1: JoeLocation : US
Group: Analyst
Original Query:
SELECT country, nationalid, ccnumber, mrn, name FROM
ww_customers
Country NationalID
CC No DOB MRN Name Policy ID
US 232323233 4539067047629850 9/12/1969 8233054331 John Doe nj23j424
US 333287465 5391304868205600 8/13/1979 3736885376 Jane Doe cadsd984
Germany T22000129 4532786256545550 3/5/1963 876452830A Ernie Schwarz KK-2345909
Country National ID CC No MRN Name
US xxxxx3233 4539 xxxx xxxx xxxx null John Doe
US xxxxx7465 5391 xxxx xxxx xxxx null Jane Doe
Ranger Policy EnforcementQuery Rewritten based on Dynamic Ranger
Policies: Filter rows by region & apply relevant column masking
Users from US Analyst group see data for US persons with CC and National ID (SSN) as masked values and MRN is nullified
Country National ID Name MRN
Germany T22000129 Ernie Schwarz 876452830A
EU HR Policy Admins can see unmasked but are restricted by row filtering policies to see data for EU persons only
Original Query:
SELECT country, nationalid, name, mrn FROM
ww_customers
AnalystsHR Marketing
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaData Protection
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Protection in Hadoop
must be applied at three different layers in Apache Hadoop
Storage: encrypt data while it is at rest
Transparent Data Encryption in HDFS, Ranger KMS + HSM, Partner Products (HPE Voltage, Protegrity, Dataguise)
Transmission: encrypt data as it is in motion
Native Apache Hadoop 2.0 provides wire encryption.
Upon Access: apply restrictions when accessed
Ranger (Dynamic Column Masking + Row Filtering), Partner Masking + Encryption
Data Protection
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger KMS
Transparent Data Encryption in HDFS
NN
A B
C D
HDFS Client
A B
C D
A B
C D
DN DN DN
Benefits Selective encryption of relevant files/folders Prevent rogue admin access to sensitive data Fine grained access controls Transparent to end application w/o changes Ranger KMS integrated to external HSM
(Safenet Luna) adding to reliability/security of KMS
SafeNet-Luna HSM
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaApache Atlas: Vision & Features Overview
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Background: DGI Community becomes Apache Atlas
May2015
Apache AtlasIncubation
DGI groupKickoff
Dec 2014
Apr2017
Apache 0.8 Release
Global FinancialCompany
* DGI: Data Governance Initiative
Aug2016
Apache 0.7Foundation Release
Apache Atlas 0.8/HDP2.6• Simplified Search UI
• Simplified APIs
• Classification-based security for HDFS, Kafka, HBase
• Knox SSO
• Performance/scalability improvements
Apache Atlas 0.7.1/HDP2.5
• High availability support
• LDAP Authentication/Authorization
• Classification based security for Hive
• UI Redesign
• Committers – 35• Code contributors from
- Hortonworks, IBM, Aetna, Merck, Target
Jun2017
Atlas BecomesTLP!
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas Vision: Open Metadata & Governance Services
STRUCTURED
TRADITIONALRDBMS
METADATA
MPP APPLIANCES
Kafka Storm
Sqoop
Hive
ATLASMETADATA
Falcon
RANGERCustom
Partners
Comprehensive Enterprise Data Catalog• Lists all of your data, where it is located, its origin (lineage), owner, structure,
meaning, classification and quality• Integrate both on-premise and cloud platforms to provide enterprise wide view
Open Enterprise Data Connectors• Interoperable connector framework to connect to your data catalog out of the
box with many vendor technologies• No expensive population of proprietary siloed metadata repositories
Dynamic Metadata Discovery• Metadata is added automatically to the catalog as new data is created or data is
updated• Extensible discovery processes that characterize and classify the data
Enabling Collaboration & Workflows • Subject matter experts locate the data they need quickly and efficiently, share
their knowledge about the data and its usage to help others • Interested parties and processes are notified automatically
Automated Governance Processes • Metadata-driven access control• Auditing, metering, and monitoring• Quality control and exception management• Rights (entitlement) management
Predefined standards for glossaries, data schemas, rules and regulations
Vision:
Metadata-driven foundational governance services for enterprise data ecosystem
• Open frameworks and APIs
• Agile and secure collaboration around data and advanced analytics
• Reduce operational costs while extracting economic value of data
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
High Level Architecture: 4 Key points
Type System
Repository
Search DSL
Bri
dgeHive Storm
Falcon Custom
REST API
Graph DB
Sear
ch
Kafka
SqoopC
on
ne
cto
rs
Me
ssag
ing
Fram
ewo
rk
3 REST API
Modern, flexible access to Atlas services, HDP components, UI & external tools
1 Data Lineage
Only product that captures lineage across Hadoop components at platform level.
4 Exchange
Leverage existing metadata / models by importing it from current tools. Export metadata to downstream systems
2 Agile Data Modeling:
Type system allows custom metadata structures in a hierarchy taxonomy
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lineage • Where does this data originate from (source/provenance)?
• Upstream path: Path through all data assets and processes leading up to current data asset
Impact• How is this data being used ?
• What other data assets (derivative/dependent) does this impact?
• Downstream path: Path through all data assets and processes leading out of current data asset
Used for forensics • Impact analysis
• Auditing and Compliance
Apache Atlas : Lineage
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas: Lineage and Impact
44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas: Classification• Categorize and curate data assets for easier discovery• Associate context with data assets – Governance, Security, Business, …
GOVERNANCE
SECURITY
BUSINESS
48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Atlas Classification: usecase – cross component
Classification based security on cross-component data assets
51 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Metadata Catalog Search : Basic Search
Search for a hive_table classified as ‘PII’ and name starting with ‘prov’
Filter byData Asset type
Filter byClassification
Search textWildcards: prov*, *sum*Logical expressions: prov* AND *sum*
52 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Metadata Catalog Search : Advanced
Filter byData asset type
Search for a hive_table named ‘employees’ and owner ‘hive’
DSL search with SQL like syntax Select columns from impressions table in raw database
hive_column where table.name=‘impressions’ and table.db.name = ‘raw’
DSL query string
53 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Takeaways Secure APIs and UIs in Hadoop ecosystem using Apache Knox gateway
Enforce appropriate security controls to monitor data access across your businesses with Apache Ranger– Implement fine-grained policy based controls to grant and monitor data access
– Track user activity on data using user access audit logging features to help with forensic auditing for breach notification purposes
– Protect sensitive data through anonymization and pseudonymization using dynamic masking and row filtering
Establish an Enterprise Data Catalog with Apache Atlas– Identify and classify data
– Harvest and maintain metadata
Track and map the movement of data through your enterprise with Apache Atlas– Maintain a “Near Real Time” view to track data movement
– Understand data proliferation (especially sensitive data) with data lineage and impact analysis
54 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
More Information…Coming up Next..
BoF session – Security, Governance & Cybersecurity
When: 6:00pm, Thursday September 21st 2017
Where: C4.7
Also Check out other sessions on Apache Atlas & Apache Ranger from recent DataWorks Summits
https://dataworkssummit.com/san-jose-2017/
https://dataworkssummit.com/munich-2017/
HortonworksProduct Pages
https://hortonworks.com/apache/ranger/
https://hortonworks.com/apache/atlas
Hortonworks Community Connection:
https://community.hortonworks.com/spaces/64/governance-lifecycle-track.html
https://community.hortonworks.com/spaces/62/security-track_2.html
Apache Software Foundationhttp://ranger.apache.org/
http://atlas.apache.org/
Top Related