TriHUG 2/14: Apache Sentry
description
Transcript of TriHUG 2/14: Apache Sentry
![Page 1: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/1.jpg)
1
Deploying enterprise grade security for Hadoop Brock Noland |So.ware Engineer, Cloudera February 27, 2014
![Page 2: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/2.jpg)
Outline
• IntroducCon • Hadoop security primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
2
![Page 3: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/3.jpg)
IntroducCon
Tonight's focus is SQL-‐on-‐Hadoop • Vast majority of Hadoop users use Hive or Cloudera Impala
• Data warehouse offload is the most common use case
• Data warehouse offload is a two step process 1. AutomaCc transformaCons moved to Hadoop 2. Data analysts given query access
3
![Page 4: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/4.jpg)
Data warehouse use case
4
Online Database Data Warehouse Hadoop
![Page 5: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/5.jpg)
Outline
• IntroducCon • Hadoop Security Primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
5
![Page 6: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/6.jpg)
AuthenCcaCon
• AuthenCcaCon is who you are • Hadoop models
• Default -‐ “trusted network” • Strong -‐ Kerberos
6
![Page 7: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/7.jpg)
Default AuthenCcaCon – trusted network
• Default security mechanism • Hadoop client uses local username • Used in
• POCs • Startups • Demos • Pre-‐prod environments
7
![Page 8: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/8.jpg)
Default AuthenCcaCon – trusted network
8
Client Host Hadoop
$ whoami brock $ cat a.txt some data $ hadoop fs -‐put a.txt .
User: brock File: a.txt Contents: some data
![Page 9: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/9.jpg)
Strong AuthenCcaCon – Kerberos
• Hadoop is secured with Kerberos • Provides mutual authenCcaCon • Protects against eavesdropping and replay a^acks
• Every user and service has a Kerberos “principal” • Service: impala/[email protected] • User: [email protected]
• CredenCals • Service: keytabs • User: password
9
![Page 10: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/10.jpg)
Strong AuthenCcaCon – Kerberos
10
Client Host Hadoop
$ whoami brock $ kinit Password: ******* $ cat a.txt some data $ hadoop fs -‐put a.txt .
<kerberos Ccket> <encrypted data> *
* RPC EncrypCon must be enabled
![Page 11: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/11.jpg)
Strong AuthenCcaCon – Kerberos
• Keytab • Encrypted key for servers (similar to a “password”) • Generated by server such as MIT Kerberos or AcCve Directory
11
![Page 12: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/12.jpg)
Strong AuthenCcaCon – Kerberos
• ImpersonaCon • Services such as Hive Server2 impersonate users • Data loaded by “joe” via HS2 is owned by “joe” • Oozie jobs submi^ed by “brock” are run as “brock”
12
![Page 13: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/13.jpg)
Hive Server 2 and Oozie
13
Hadoop
Hive Server 2 (HS2) Oozie
Beeline (Hive CLI) Tableau JDBC Oozie CLI Control-‐M
![Page 14: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/14.jpg)
AuthorizaCon
• HDFS permissions • Unix style • Read/Write/Execute for Owner/Group/Other • Coarse grained
• Other Hadoop components have authorizaCon • MapReduce who can use which job queues • HBase table ACL’s
14
![Page 15: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/15.jpg)
$ hadoop fs -ls file -rw-r----- 1 analyst1 analysts 2244 2014-01-19 12:15 file
• Permissions
• Unix style permissions • Read/Write/Execute • Owner/Group/Other
• Owner • One and only one owner
• Group • One and only one group
HDFS Permisssions
![Page 16: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/16.jpg)
Back to our use case
• Scenario facts • ETL offload is a success • Data warehouse is expensive and at capacity • Same data is in Hadoop
• Next step • End users start using Hadoop to augment the DW • Security becomes primary concern
16
![Page 17: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/17.jpg)
End users need to share data
• Unlike automated ETL jobs, end users want to share data with peers
• Must manage HDFS permissions manually • Each file has a single group • End result is users set permissions to world readable/writeable
17
![Page 18: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/18.jpg)
Outline
• IntroducCon • Hadoop Security Primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
18
![Page 19: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/19.jpg)
Hive: Security holes
CREATE TEMPORARY FUNCTION custom_udf AS ’com.mycompany. MaliciousClass’; SELECT TRANSFORM(stuff) USING 'malicious-script.pl' AS thing1, thing; CREATE EXTERNAL TABLE external_table(column1 string) LOCATION ‘/path/to/any/table’;
19
![Page 20: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/20.jpg)
Hive: Security holes
CREATE TABLE test (c1 string) ROW FORMAT SERDE 'com.mycompany.MaliciousClass'; FROM ( FROM t1 MAP t1.c1 USING 'malicious-script1.pl' CLUSTER BY key) map_output INSERT OVERWRITE TABLE t2 REDUCE t2.c1 USING 'malicious-script2.pl' AS c2;
20
![Page 21: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/21.jpg)
Default: AuthorizaCon
• Hive ships with an “advisory” authorizaCon system • All users see all databases/tables/columns • Does not fix any security holes • Users grant themselves permissions
21
![Page 22: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/22.jpg)
Outline
• IntroducCon • Hadoop Security Primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
22
![Page 23: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/23.jpg)
Kerberos with impersonaCon: Sharing data
The user “manager1” wants to share the table “manager1_table” with senior analysts but not junior analysts. # hadoop fs -ls -R /user/hive/warehouse drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxr-x--T - manager1 manager1 0 manager1_table
23
![Page 24: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/24.jpg)
Kerberos with impersonaCon: Sharing data
IT must create a group # groupadd senioranalysts
Then add the appropriate members to group # usermod -G analyst,senioranalysts analyst1 # usermod -G management,analyst,senioranalysts manager1
24
![Page 25: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/25.jpg)
Kerberos with impersonaCon: Sharing data
Then “manager1” can manually change the file permissions $ hadoop fs -chgrp -R senioranalysts …/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxr-x--T - manager1 senioranalysts 0 manager1_table
25
![Page 26: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/26.jpg)
Kerberos with impersonaCon: Sharing data
Now any senior-‐level analyst can query the data $ whoami analyst1 $ beeline ... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> ⏎ select count(*) from manager1_table; +------------+ | count(*) | +------------+ | 47 | +------------+
26
![Page 27: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/27.jpg)
Kerberos with impersonaCon: Sharing data
Junior analysts cannot query the data: $ whoami jranalyst1 $ beeline .... Connected to: Hive (version 0.10.0) 0: jdbc:hive2://localhost:10000/default> ⏎ select * from manager1_table; Error: java.io.IOException: org.apache.hadoop.security.AccessControlException: Permission denied: user=jranalyst1, access=READ_EXECUTE, inode="/user/hive/warehouse/manager1_table":manager1:senioranalysts:drwxr-x--T
27
![Page 28: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/28.jpg)
Kerberos with impersonaCon: Sharing data
What happens in the real world?
28
![Page 29: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/29.jpg)
Kerberos with impersonaCon: Sharing data
Table “manager1_table” is owned by user/group “manager1” $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxr-x--T - manager1 manager1 0 manager1_table
29
![Page 30: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/30.jpg)
Kerberos with impersonaCon: Sharing data
User “manager1” makes “manager1_table” world readable/writable $ hadoop fs -chmod -R 777 /user/hive/warehouse/manager1_table $ hadoop fs -ls /user/hive/warehouse/ Found 3 items drwxr-x--T - analyst1 analyst1 0 analyst1_table drwxr-x--T - jranalyst1 jranalyst1 0 jranalyst1_table drwxrwxrwt - manager1 manager1 0 manager1_table
30
![Page 31: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/31.jpg)
Kerberos with impersonaCon: Summary
• Securing Hive with Kerberos and impersonaCon makes Hive unusable for DW offload • Manual file permission management • End state is world writable/readable • No ability to restrict access to columns or rows • All users see all databases/tables/columns
31
![Page 32: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/32.jpg)
Outline
• IntroducCon • Hadoop Security Primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
32
![Page 33: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/33.jpg)
Fine Grained Security: Apache Sentry
33
Unlocks Key RBAC Requirements Secure, fine-‐grained, role-‐based authorizaCon MulC-‐tenant administraCon
Open Source Apache Incubator project
Ecosystem Support Apache SOLR, HiveServer2, & Impala 1.1+
AuthorizaRon module for Hive, Search, & Impala
![Page 34: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/34.jpg)
Key Benefits of Sentry
34
Store SensiCve Data in Hadoop
Extend Hadoop to More Users
Comply with RegulaCons
![Page 35: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/35.jpg)
Key CapabiliCes of Sentry
35
Fine-‐Grained AuthorizaCon Specify security for SERVERS, DATABASES, TABLES & VIEWS
Role-‐Based AuthorizaCon SELECT privilege on views & tables INSERT privilege on tables ALL privilege on the server, databases, tables & views ALL privilege is needed to create/modify schema
MulC-‐Tenant AdministraCon Separate policies for each database/schema Can be maintained by separate admins
![Page 36: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/36.jpg)
Sentry Architecture
36
Binding Layer
Impala
Impala Hive
Policy Engine
Policy Provider
File Database
HiveServer2
Authoriza5on Provider
Local FS/HDFS
Search
SOLR
Pig …
![Page 37: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/37.jpg)
Query MR
SQL
Query ExecuCon Flow
37
Parse
Build
Check
Plan
Sentry
Validate SQL grammar
Construct statement tree
Validate statement objects • First check: AuthorizaCon
Forward to execuCon planner
![Page 38: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/38.jpg)
Outline
• IntroducCon • Hadoop Security Primer
• AuthenCcaCon • AuthorizaCon
• Security opCons • Default • Kerberos with ImpersonaCon • Kerberos with Sentry
• Demo
38
![Page 39: TriHUG 2/14: Apache Sentry](https://reader034.fdocuments.in/reader034/viewer/2022042613/54c672b24a7959f67d8b45fc/html5/thumbnails/39.jpg)
Click to edit Master Ctle style
39