Is your Enterprise Data lake Metadata Driven AND Secure?

23
Is Your Enterprise Data Lake Metadata Driven AND Secure? Apache Atlas + Ranger

Transcript of Is your Enterprise Data lake Metadata Driven AND Secure?

Page 1: Is your Enterprise Data lake Metadata Driven AND Secure?

Is Your Enterprise Data Lake Metadata Driven AND Secure?

Apache Atlas + Ranger

Page 2: Is your Enterprise Data lake Metadata Driven AND Secure?

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Disclaimer

This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed.

Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery.

This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product.

Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.

Page 3: Is your Enterprise Data lake Metadata Driven AND Secure?

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda

• Introduction

• Overview Apache Atlas & Ranger

• Technical Preview: Dynamic, Tag based Policies

• Q & A

Page 4: Is your Enterprise Data lake Metadata Driven AND Secure?

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Speakers

Andrew AhnDirector, Governance Product Management

Madhan NeethirajDirector, Enterprise Security Engineering

Page 5: Is your Enterprise Data lake Metadata Driven AND Secure?

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Atlas + Ranger Overview

Page 6: Is your Enterprise Data lake Metadata Driven AND Secure?

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Atlas is Metadata Services

Metadata Services Foundation — HDP 2.3• Business Catalog: Taxonomy based classification

• Technical Data: e.g. Model for Hive: DB, Tables, Views and Columns

• Centralized location for all metadata inside and single Interface point for Metadata Exchange with platforms outside of HDP

Metadata that enriches every component

Available Now with HDP 2.3• Hive – Complete lineage, every SQL statement tracked• Ambari – setup & monitoring

Apache Atlas

Hiv

e

Ran

ger

Falc

on

Sqoo

p

Stor

m

Kaf

ka

Spar

k

NiF

i

1Q2016 – Technical Preview• Sqoop – supplement Hive lineage based on Sqoop import/export• Storm & Kafka – lineage for topologies and participating queues/topics • Ranger – Dynamic Security Policies: leveraging metadata tags• Falcon - Process entities lineage

Roadmap• HDFS – Correlated with other components• Spark – support for SparkSQL• NiFi – integrate fine-grained data provenance with Atlas

Page 7: Is your Enterprise Data lake Metadata Driven AND Secure?

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Big Data Management Through Metadata

Management ScalabilityMany traditional tools and patterns do not scale when applied to multi-tenant data lakes. Many enterprise have silo’d data and metadata stores that collide in the data lake. This is compounded by the ability to have very large windows (years). Can traditional EDW tools manage 100 million entities effectively with room to grow ?

Metadata Tools

Scalable, decoupled, de-centralized manage driven through metadata is the only via solution. This allows quick integration with automation and other metamodels

Tags for Management, Discovery and Security

Proper metadata is the foundation for business taxonomy, stewardship, attribute based security and self-service.

Page 8: Is your Enterprise Data lake Metadata Driven AND Secure?

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Dynamic Access Policy Requirements

• Basic Tag policy – PII example. Access and entitlements must be tag based ABAC and scalable in implementation.

• Geo-based policy – Policy based on IP address, proxy IP substitution maybe required. The rule enforcement but be geo aware.

• Time-based Tag policy – Timer for data access, de-coupled from deletion of data.

• Prohibitions – Prevention of combination of Hive tables/Columns that may pose a risk together.

Page 9: Is your Enterprise Data lake Metadata Driven AND Secure?

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

How does Atlas work with Ranger at scale?

Atlas provides: Metadata• Business Classification (taxonomy): Company > HR > Driver

• Hierarchy with Inheritance of attribute to child objects: Sensitive “PII” tag of department HR will be inherited by group HR> Driver

• Atlas will notify Ranger via Kafka Topic for changes

Apache Atlas

Hiv

e

Ran

ger

Falc

on

Kaf

ka

Stor

m

Atlas provides the metadata tag to create policies

Ranger provides: Access & Entitlements

• Ranger will cache tags and asset mapping for performance

• Ranger will have policies based on tags instead of roles.

• Example: PII = <group> This can work for many assets.

Page 10: Is your Enterprise Data lake Metadata Driven AND Secure?

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ranger:Dynamic classification based Security

Page 11: Is your Enterprise Data lake Metadata Driven AND Secure?

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ranger: Introduction

Centralized authorization and auditing across Hadoop components• HDFS, Hive, HBase, Knox, Strom, YARN, Kafka, Solr, ..• Audit logs to: Solr, HDFS, RDBMS, Log4j, ..

Resource based security• Policies for specific set of resources• Requires revision of policies as resources get added/moved

Classification based security• Policies for classifications and not for specific resources• A single policy protects resources in multiple components• As classification for resources change, appropriate policies would

automatically be applied• Enables separation of duties: resource-classification and security policies

Page 12: Is your Enterprise Data lake Metadata Driven AND Secure?

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ranger: Authorization and Auditing

HBase

Ranger Administration Portal

HDFS

Hive Server2

Ranger Audit StoreRanger Policy Store

Ranger Plugin

Hadoop Components

Enterprise Users

Log4j

Knox

Storm

YARN

Kafka

Solr

HDFS

Solr

Ranger Plugin

Ranger Plugin

Ranger Plugin

Ranger Plugin

Ranger Plugin

Ranger Plugin

Ranger Plugin

RDBMS

Page 13: Is your Enterprise Data lake Metadata Driven AND Secure?

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Atlas + Ranger integration

Metastore

• Tags• Assets• Entities

Notification Framework

Kafka Topics

AtlasAtlas Client

• Subscribes to Topic• Gets Metadata

Updates

PDPResource Cache

Ranger

Notification Metadata updates

Messagedurability

Optimized for Speed

Event driven updates

Page 14: Is your Enterprise Data lake Metadata Driven AND Secure?

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

DEMO

Page 15: Is your Enterprise Data lake Metadata Driven AND Secure?

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Setup for the demo

Database Table Columnsfinance tax_2010 Table Access Expires on 12/31/2015

hr employee SSN tagged as PII

Users:• analyst: No access to PII, No access to Expired Data• admin: Access to PII, Access to Expired Data

Page 16: Is your Enterprise Data lake Metadata Driven AND Secure?

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Atlas: tag a column as PII

3. Select ‘Tags’ tab 4. Click on ‘Add Tag’

5. Select PII tag & click ‘Save’

1. Search for the column 2. Select the column

Page 17: Is your Enterprise Data lake Metadata Driven AND Secure?

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Atlas: tag a table for expiry_date

Select EXPIRES_ON tag and enter value for expiry_date

Page 18: Is your Enterprise Data lake Metadata Driven AND Secure?

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ranger: authorization policy for PII

Pick the tag

Deny access to PII data to all users with exception of ‘admin’ user

Page 19: Is your Enterprise Data lake Metadata Driven AND Secure?

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ranger: authorization policy for expiry_date

Pick the tag

Deny access to data after expiry date with the exception of ‘admin’ user

Page 20: Is your Enterprise Data lake Metadata Driven AND Secure?

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ranger: access audit logs

Tags associated with resourcesResources accessedPolicy that allowed/denied access

Page 21: Is your Enterprise Data lake Metadata Driven AND Secure?

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Questions

Page 22: Is your Enterprise Data lake Metadata Driven AND Secure?

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

References

Page 23: Is your Enterprise Data lake Metadata Driven AND Secure?

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

References

• Apache Atlas• http://atlas.apache.org• http://hortonworks.com/apache/atlas

• Apache Ranger• http://ranger.apache.org• http://hortonworks.com/apache/ranger

• Apache Ranger wiki• https://cwiki.apache.org/confluence/display/RANGER

• Tag based policies• https://cwiki.apache.org/confluence/display/RANGER/Tag+Based+Policies

• Geo-location based policies• https://cwiki.apache.org/confluence/display/RANGER/Geo-location+based+policies