© Hortonworks Inc. 2014
August 2014
Page 1
Use cases and security solutions
Apache Hive
Thejas Nair @thejasn
© Hortonworks Inc. 2014 Page 2
• Introduce key security concepts • Use cases • Authorization solutions • Followed by specific use cases and experience at Yahoo!
What are we talking about ?
© Hortonworks Inc. 2014 Page 3
Authentication vs Authorization
• Authentication – Verifying your identity – Enabled in Hadoop using Kerberos – More options with HiveServer2
• Authorization
– Verifying if you have permissions to perform this action
Pic1 – – https://flic.kr/p/5qQiJR QiJR Pic2 - https://flic.kr/p/3i4SW
© Hortonworks Inc. 2014 Page 4
What is Apache Hive?
https://flic.kr/p/nff9gY
It depends on who you ask!
© Hortonworks Inc. 2014 Page 5
What is Apache Hive?
Its a table oriented storage layer!
It is a SQL database!
© Hortonworks Inc. 2014 Page 6
Components - The table storage layer
HDFS Metastore
Pig/MR
Hcatalog
Data Metadata
© Hortonworks Inc. 2014 Page 7
• Data – FileSystem – /hive/warehouse/…/table1/ – Traditional POSIX permissions
– rwxr-x--- owner: thejas, group: dev – More flexibility with Access Control Lists – More flexibility with Apache Argus (incubating)
Authorization – The table storage layer
© Hortonworks Inc. 2014 Page 8
• Metadata – {name : table1, storage_info : {dir : /hive/…/table1}, columns: {..}, .. } – Authorization ?
Authorization – The table storage layer
© Hortonworks Inc. 2014 Page 9
• Don’t add another source of truth for authorization!
• Metadata access based on corresponding data access.
Storage Based Authorization
© Hortonworks Inc. 2014 Page 10
• Update configuration in metastore – http://s.apache.org/SBA – Ensure that only metastore server has access to its RDBMS
Enabling Storage Based Authorization
© Hortonworks Inc. 2014 Page 11
• Hive command line – bin/hive –e ‘select * from ..’ – Same use case as Pig, MapReduce – Storage Based Authorization applicable here
Hive as a SQL query engine
© Hortonworks Inc. 2014 Page 12
• ODBC/JDBC application/tools – Adds HiveServer2 at the front – Query processing – same way as commandline – Storage Based Authorization applicable here – Have query run as end user
– Default configuration hive.server2.enable.doAs=true
Hive as a SQL query engine
© Hortonworks Inc. 2014 Page 13
• Simple. – One source of truth. Just manage the FileSystem permissions.
• Flexible HDFS ACL support – Requires upcoming hive 0.14 release.
SBA : What is great about it?
© Hortonworks Inc. 2014 Page 14
• Access control at row and column level – FileSystem permissions are at the level of dir and files
SBA: What is missing ?
© Hortonworks Inc. 2014 Page 15
• Data access api should be fine grained – API needs support for row/column concept
• HiveServer2 ? – Data server for ODBC/JDBC – SQL as api supports selecting rows,columns
Fine grained control : pre-requisites
© Hortonworks Inc. 2014 Page 16
• Fine grained authorization with HiveServer2
• Grant/Revoke statements • Based on SQL standard
SQL standards based authorization
© Hortonworks Inc. 2014 Page 17
• Compile Query • -> Query Plan • -> Actions required on objects – (eg READ : table1, DROP : table2)
• -> Privileges on objects – (eg SELECT : table1, OWNER: table2)
• Check if user has required privileges
SQL std based auth: How it works
© Hortonworks Inc. 2014 Page 18
• GRANT/REVOKE <PRIVILEGE> ON <OBJECT> TO <USERS>
• <USERS> can be a user or a role • Delegate management of privileges/roles
• Hive ‘DBA’ can be added to ‘ADMIN’ role
Authorization Policy
© Hortonworks Inc. 2014 Page 19
• Supported using views – Grant access to view, not base table – Select clause – select columns – Where clause – select rows
Fine Grained Authorization
© Hortonworks Inc. 2014 Page 20
• Disallows features that bypass the fine grained authorization checks.
• dfs commands, transform clause, create udfs
• admin can add permanent UDFs
Restrictions
© Hortonworks Inc. 2014 Page 21
• Grant access on files for HiveServer2 process user
• Run queries as this user – Configure hive.server2.enable.doAs=false
SQL std based auth: Query processing
© Hortonworks Inc. 2014 Page 22
• Authorization plugin API • Apache Argus first user
Extending Hive Authorization
© Hortonworks Inc. 2014 Page 23
• Grant/revoke based access control • Unsecure/incomplete model • Unsecure model for Hive command line
Hive default authorization
© Hortonworks Inc. 2014 Page 24
• Playing well with each other 1. Metadata authorization using Storage Based
Authorization 2. Fine grained authorization options in
HiveServer2 3. Both 1 & 2
Conclusion
Use Cases a t Yahoo !
PRESENTED BY Chris Drome⎪ August 20, 2014
Overview of Use Cases
26 Yahoo Confidential & Proprietary
§ Column and row level access controls › Hive 0.13 SQL Standards Based Hive Authorization
• Authorization model managed by metastore
› HiveServer2 • Serving engine with authorization plugin
› Views • Fine grain authorization on a table
§ (Limited) Authorization for Hive CLI › HCatalog server-side security › HDFS file permission based authorization (StorageBasedAuthorizationProvider) › HiveMetastoreAuthorizationProvider plugin
The Players
27 Yahoo Confidential & Proprietary
§ Producers › ETL jobs load data to grid › Primarily Pig jobs › Some MR jobs › Owners of the data (read/write file permissions)
• Owner of directories and files
§ Consumers › Consumes some sub-set of data › Readers of the data (read-only file permissions)
• Member of group with read-only permissions
The Challenges
28 Yahoo Confidential & Proprietary
§ Producers › Latency SLAs on a large volume of data › Responsible for managing data
• Reloading data, rolling up data, archiving data
› Responsible for managing access to data (groups)
§ Consumers › Access controlled by membership in consumer group › Access controls at column or row level not possible › Limited to one group per table › Access may be through Pig, Hive, MR, BI tools, etc.
Fine Grain Access Control with HiveServer2
29 Yahoo Confidential & Proprietary
§ HiveServer2 as query execution engine § HiveServer2 responsible for verifying authorization § HiveServer2 runs as “super-user” with read privileges
› Connecting user doesn’t have access permissions on underlying files › Executes query on behalf of connecting user
§ Define arbitrary access controls with views on tables › Able to restrict by columns and/or rows › Grant access to individual users
§ Prototype with Sentry as proof-of-concept
(Limited) Authorization for Hive CLI
30 Yahoo Confidential & Proprietary
§ Not practical to prevent use of Hive CLI § Hive CLI could be used to circumvent HS2-based authorization § HCatalog server-side security uses StorageBasedAuthorizationProvider
to check HDFS access permissions › Chain with an authorization plugin (HiveMetastoreAuthorizationProvider)
§ Perform HCatalog-based authorization of DDL tasks › Prevent users from creating/dropping objects in databases without authorization
§ Perform HCatalog-based authorization for data access § Simple prototype as proof-of-concept
Top Related