Fine-Grained Security for Spark and Hive

Post on 16-Apr-2017

1.171 views 2 download

Transcript of Fine-Grained Security for Spark and Hive

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Fine-Grained Security for Spark and HiveCarter Shanklin - Director PMDon Bosco Durai - Security ArchitectJune 29, 2016

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda●Current security options and challenges●Apache Ranger Overview●LLAP Overview●Use Cases and Demo●Apache Atlas Integration

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Current Options and Challenges

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Current Options and Challenges

⬢Limited to storage level access control for Spark, Pig and MR

⬢Column Level Access via HiveServer2

⬢Row Level filtering need Hive Views– Multiple Hive Views needs to be created and managed– Explicit permissions need to be given for each view/user– User need to know which view to use

⬢Masking needs custom UDF– Needs to be wrapped using Views

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ranger Overview

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Ranger

• Central audit location for all access requests

• Support multiple destination sources (HDFS, Solr, etc.)

• Real-time visual query interface

AuditingAuthorization

• Store and manage encryption keys

• Support HDFS TDE• Integration with HSM

Ranger KMS

• Centralized platform to define, administer and manage security policies consistently

• Enforce policies within each component

© Hortonworks Inc. 2015. All Rights Reserved

© Hortonworks Inc. 2015. All Rights Reserved

© Hortonworks Inc. 2015. All Rights Reserved

Ranger Architecture

HDFS

Ranger Administration Portal

HBase

Hive Server2

Ranger Audit Server

Ranger Plugin

Had

oop

Com

pone

nts

Ent

erpr

ise

Use

rs

Ranger Plugin

Ranger Plugin

Legacy Tools and Data Governance

HDFS

Knox

NifI

Ranger Plugin

Ranger Plugin

RDBMS

SolrRanger Plugin

Ranger Policy Server Integration API

KafkaRanger Plugin

YARNRanger Plugin

Ranger PluginStorm

Ranger Plugin

Atlas

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Audits - Data Access

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Audits - Admin Actions

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

LLAP Overview

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hive 2.0 and LLAP

⬢ At a High Level:– 2000+ features, improvements and bug

fixes in Hive since HDP 2.4.– 600+ of these from outside of

Hortonworks.

⬢ Major Improvements:– Preview: Hive LLAP: Persistent query

servers with intelligent in-memory caching.

– ACID GA: Hardened and proven at scale.– Expanded SQL Compliance: More capable

integration with BI tools.– Performance: Interactive query, 2x faster

ETL.– Security: Row / Column security

extending to views, Column level security for Spark.

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hive 2 with LLAP: Architecture Overview

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hive 2 with LLAP: Open Interfaces

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Integration with Hive and LLAP

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hive / LLAP Security Capabilities with Ranger

⬢ Ranger Hive plugin provides authorization / access controls.⬢ Column Masking:

– Inject Hive UDFs that mask characters or hash values.– Dynamic, per-user.

⬢ Dynamic Row Filtering:– Query is analyzed and policies applied.– Dynamic, per-user.

⬢ All operations run as ordinary SQL queries:– Masking statements convert to clauses in the SQL select clause.– Filters convert to clauses in the SQL where clause.

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Native Hive Masking Capabilities

UDF Purpose Example Start Example Resultmask Convert letters to X/x and

numbers to n. 123 Fake St. nnn Xxxx Xx.

mask_first_n Mask only the first n characters. 433-54-3937 nnn-54-3937

mask_last_n Mask only the last n characters. 433-54-3937 433-54-nnnn

mask_show_first_n Mask, showing only the first n characters. 555-233-1234 555-nnn-nnnn

mask_show_last_n Mask, showing only the last n characters. 433-54-3937 nnn-nn-3937

mask_hash Produce a consistent hash of the field. CA 21f241cccaa5cfa33190f56ff1510

e37

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Delivering Spark Security

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Key Features: Spark Column Security with LLAP

⬢ Fine-Grained Column Level Access Control for SparkSQL.

⬢ Fully dynamic policies per user. Doesn’t require views.

⬢ Use Standard Ranger policies and tools to control access and masking policies.

Flow:1. SparkSQL gets data locations

known as “splits” from HiveServer and plans query.

2. HiveServer2 authorizes access using Ranger. Per-user policies like row filtering are applied.

3. Spark gets a modified query plan based on dynamic security policy.

4. Spark reads data from LLAP. Filtering / masking guaranteed by LLAP server.

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Example: Per-User Row Filtering by Region in SparkSQL

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use Cases

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Demo Setup⬢Customer User and Sales data in ORC (Metadata in MetaStore)

⬢Data can be access via SparkSQL or HiveServer2

⬢Marketing needs access to Sales and Users data for analytics

⬢Fraud Investigation team needs access to data for fraud detection

⬢Billing team needs access to Sales and Users data for billing

Users

customer_id

customer_name

customer_email

customer_phone

customer_ccn

customer_state

customer_zip

Sales

customer_id

product_id

promotion_id

cookie_id

tracking_id

Group Users

Fraud frank

Marketing mark

Billing bill

Tables

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use Case 1: Restricting Column Access

This is a simple use case where certain groups or users don’t permission to view the query

⬢Billing group has access to all columns in table Users

⬢Marketing group can’t access credit card column from table Users

Users

customer_id

customer_name

customer_email

customer_phone

customer_ccn

customer_state

customer_zip

User/Column customer_phone customer_ccn

bill (Billing) 😀 😀

mark (Marketing) 😀 😡

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policy - Restrict Columns

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policy - Restrict Columns - Results

bill from Billing

mark from Marketing

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policy - Restrict Columns - Audit Screen

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use Case 2: Column Masking

In this use case where certain groups or users won't be able to see the real value of certain columns.

⬢Billing group can see the real/raw values for all columns in table Users

⬢Fraud group can only see masked values of PII and PCI fields from table Users

Users

customer_id

customer_name

customer_email

customer_phone

customer_ccn

customer_state

customer_zip

User/Column customer_email, customer_phone, customer_ccn

bill (Billing) 😀

frank (Fraud) 😎

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policies - Mask Fields

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policy - Column Masking - Results

bill from Billing

frank from Fraud

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policy - Column Masking - Audit Screen

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use Case 3: Row Filtering

In this use case where certain groups or users won't be able to see all the rows from certain tables

⬢Billing group can see all the rows in the table Users

⬢Marketing can only see rows/data from their region in the table Users

Users

customer_id

customer_name

customer_email

customer_phone

customer_ccn

customer_state

customer_zip

User/Column Rows in Users table

bill (Billing) 😀

Mark (Marketing-CA)

Only CA Users

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policies - Row Filtering

34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policy - Row Filtering - Results

bill from Billing

mark from Marketing

35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Use Case 4: Row Filtering - Cross TableThis an extension of previous use cases, where the context information for filtering the row is in another table.

⬢Billing group can see all the rows in the table Sales

⬢Marketing can only see rows/data from their region in the table Sales, however Sales table doesn’t have the customer geographic information, so it needs to be derived from Users table

Users

customer_id

customer_name

customer_email

customer_phone

customer_ccn

customer_state

customer_zip

User/Column Rows in Sales table

bill (Billing) 😀

Mark (Marketing-CA)

Only CA Users

Sales

customer_id

product_id

promotion_id

cookie_id

tracking_id

36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger Policies - Row Filtering - Cross Table

37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache Atlas Integration

38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Cross Product Symbiosis

Apache Atlas

Apache Ranger

LLAP

Classification/ Tagging

Governance

Lineage

Tag Based Policies

Dynamic Custom Policies

Enforcement hooks

HDFS S3

MetaStore

* Column Masking and Row Filtering not yet supported by tag based policy

39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Ranger - Tag Based Policies

40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Q & A