© Hortonworks Inc. 2014
Securing Hadoop’s REST APIs Apache Knox Gateway
Hadoop Summit 2014
Kevin MinderLarry McCayhttp://knox.apache.org/
user (at) knox.apache.orgdev (at) knox.apache.org
© Hortonworks Inc. 2014
What is Apache Knox?
• The Apache Knox Gateway is…
• an extensible reverse proxy framework
• for securely exposing REST APIs and HTTP based services at a perimeter
• out of the box it provides:
• support for several of the most common Hadoop services
• integration with enterprise authentication systems
• several other useful features
© Hortonworks Inc. 2014
What the Apache Knox Gateway isn’t
• Not an alternative to Kerberos for strong Hadoop core authentication
• Not a channel for high volume data ingest or export
© Hortonworks Inc. 2014
History and Status of the Apache Knox Gateway?
• 2013-02: Accepted into Apache Incubator
• 2013-04: Released 0.2.0
• 2013-10: Released 0.3.0
• 2014-02: Graduated to Apache TLP
• 2014-04: Released 0.4.0, Included in HDP 2.1
© Hortonworks Inc. 2014
Why Knox?
Simplified Access
• Kerberos encapsulation • Extends API reach• Single access point• Multi-cluster support• Single SSL certificate
Centralized Control
• Central REST API auditing• Service-level authorization• Alternative to SSH “edge node”
Enterprise Integration
• LDAP integration• Active Directory integration• SSO integration• Apache Shiro extensibility• Custom extensibility
Enhanced Security
• Protect network details• Partial SSL for non-SSL services• WebApp vulnerability filter
© Hortonworks Inc. 2014
Layers Of Hadoop Security
Perimeter Level Security• Network Security (i.e. Firewalls)• Apache Knox (i.e. Gateways)
Authentication• Kerberos• Delegation Tokens
OS Security• File Permissions• Process Isolation
Authorization• MR ACLs• HDFS Permissions• HDFS ACLs• HiveATZ-NG• HBase ACLs• Accumulo Label Security• XA Security Policies
Data Protection• Transport• Storage
© Hortonworks Inc. 2014
REST API
HadoopServices
What does Perimeter Security really mean?
Gateway
REST API
Firewall
User
Firewall required at perimeter(today)
Knox Gateway controls all
Hadoop REST API access
through firewall
Hadoop cluster mostly
unaffected
Firewall only allows
connections through specific ports from Knox
host
© Hortonworks Inc. 2014
What REST APIs does Hadoop support?
Service URL ExampleWebHDFS http://localhost:50070/webhdfs
WebHCat (aka Templeton) http://localhost:50111/templeton
Oozie http://localhost:11000/oozie
HBase (via Stargate) http://localhost:60080
Hive (HiveServer2) http://localhost:10001/cliservice
jdbc:hive2://localhost:10001/?hive.server2.transport.mode=http;hive.server2.thrift.http.path=cliservice
© Hortonworks Inc. 2014
Basic Knox Operation & Extensibility
© Hortonworks Inc. 2014
Authentication and Identity Propagation
1. REST API Request
2. HTTP Basic Auth Challengekminder:secret
3. Authenticate kminder:secret
knoxkeytab
4. Authenticates asknox via SPNego
(i.e. Kerberos)
5. REST API RequestdoAs kminder
0. Configure knox user to be known as trusted proxy
LDAP
© Hortonworks Inc. 2014
Scalability and Fault Tolerance
Hadoop
Apache HTTPD+mod_proxy_balancerf5 BIG-IPHAProxy
Knox Cluster(no shared state)
Really any traditionalweb tier
load balancer
© Hortonworks Inc. 2014
Extensibility: Providers and Services
• Both are dynamically discovered on the class path via Java’s ServiceLoader
• Providers• Add new features to the gateway that can be used by Services• Typically result in one or more filters being added to one or more chains
• Services• Add new endpoints to the gateway to expose a specific service• Assemble filter chains to enable specific features via providers • Includes providing configuration to providers
• For example URL rewrite rules• Associates endpoints with filter chains
© Hortonworks Inc. 2014
Topology Files
• Describe the services that should be exposed for a specific cluster• Found in <GATEWAY_HOME>/conf/topologies• Name of topology file dictates URL component
• sandbox.xml -> http://localhost:8443/gateway/sandbox/webhdfs/…
<topology> <gateway> <provider> <role>authentication</role> <name>custom</name> </provider> </gateway> <service> <role>WEBHDFS</role> <url>http://localhost:50070</url> </service></topology>
Location of WebHDFS in target cluster
Selects an authentication
provider implementation
© Hortonworks Inc. 2014
Enhanced Security
© Hortonworks Inc. 2014
Topology Leakage: WebHDFS Example
• WebHDFS direct
curl -i -X PUT 'http://localhost:50070/webhdfs/v1/user/guest/file1?op=CREATE&user.name=guest’
HTTP/1.1 307 TEMPORARY_REDIRECTLocation: http://sandbox.hortonworks.com:50075/webhdfs/v1/user/guest/file1?
op=CREATE&user.name=guest&namenoderpcaddress=sandbox.hortonworks.com:8020&overwrite=false
• WebHDFS via Knox
curl -u guest:guest-password -i -k -X PUT 'https://localhost:8443/webhdfs/v1/user/guest/file2?op=CREATE’
HTTP/1.1 307 Temporary RedirectLocation: https://localhost:8443/gateway/sandbox/webhdfs/data/v1/webhdfs/v1/user/guest/file2?
_=AAAACAAAABAAAACAgUDT7-QQZlpkcm09lxrxI0Bgo9d-Egghp_qxmd4pQsmm3zvYc3M_LrDBQpMBNA48DnMS9QOhyzywCMl1WAShyX4RUETPjEcZa6x9Jwz7TMANjSRKMR6F3rKf93ME-VsI2Phe8CX72L6oiI778--8F9DQCO8LHFHzLL70iB13Hm2BLyj-x9p3tn7FOHxkbPl5d-eHxVop7Dk
RPC and HTTP address of DataNode is
leaked unnecessarily to REST client
Encrypted query param contains dispatch information used by gateway
when redirect followed
© Hortonworks Inc. 2014
Topology Leakage: Oozie Example
• Oozie direct
<configuration> <property> <name>oozie.wf.application.path</name> <value>hdfs://foo:9000/user/bansalm/myapp/</value> </property> ...</configuration>
• Oozie via Knox
<configuration> <property> <name>oozie.wf.application.path</name> <value>/user/bansalm/myapp/</value> </property> ...</configuration>
• Example of submitting an Oozie job from Apache docs• https://oozie.apache.org/docs/4.0.1/WebServicesAPI.html
• HTTP POST XML below to /oozie/v1/jobs
REST client must know
RPC address of NameNode
© Hortonworks Inc. 2014
Partial SSL for non-SSL enabled services
REST API REST API
WebHCat
DMZ
DesktopGateway
HTTPS HTTP
First “hop” through
public/corp networks
protected with SSL
Last “hop” within secure network non-SSL
© Hortonworks Inc. 2014
WebApp Vulnerability Filter
• The Knox WebAppSec provider allows for the plugin of vulnerability prevention filters• Cross Site Request Forgery CSRF is currently provided
• Uses common required header technique• Later releases will include more filters based on standard techniques
<provider <role>webappsec</role> <name>WebAppSec</name> <enabled>true</enabled> <param><name>csrf.enabled</name><value>true</value></param> <param><name>csrf.customHeader</name><value>X-XSRF-Header</value></param> <param><name>csrf.methodsToIgnore</name><value>GET,OPTIONS,HEAD</value></param></provider>
© Hortonworks Inc. 2014
Simplified Access
© Hortonworks Inc. 2014
Knox Service URLs vs. direct URLs
Service Direct URL Knox URLWebHDFS http://namenode-host:50070/webhdfs https://knox-host:8443/webhdfs
WebHCat http://webhcat-host:50111/templeton https://knox-host:8443/templeton
Oozie http://ooziehost:11000/oozie https://knox-host:8443/oozie
HBase http://hbasehost:60080 https://knox-host:8443/hbase
Hive http://hivehost:10001/cliservice https://knox-host:8443/hive
Masters could be on many
different hosts
One hosts, one port
Consistent paths
© Hortonworks Inc. 2014
Hadoop CLIs require full server configs
/etc/hive/conf/hive-site.xml
<property> <name>hive.server2.thrift.http.port</name> <value>10001</value></property><property> <name>hive.server2.thrift.http.path</name> <value>cliservice</value></property>
/etc/hadoop/conf/core-site.xml
<property> <name>fs.defaultFS</name> <value>hdfs://sandbox.hortonworks.com:8020</value></property>
/etc/hadoop/conf/hdfs-site.xml
<property> <name>dfs.namenode.http-address</name> <value>sandbox.hortonworks.com:50070</value></property>
/etc/hadoop/conf/yarn-site.xml
<property> <name>yarn.resourcemanager.address</name> <value>sandbox.hortonworks.com:8050</value></property>
/etc/hive-webhcat/conf/webhcat-site.xml
<property> <name>templeton.port</name> <value>50111</value></property>
/etc/oozie/conf/oozie-site.xml
<property> <name>oozie.base.url</name> <value>http://sandbox.hortonworks.com:11000/oozie</value></property>
HBase – Command line
These files may all be on different nodes on the cluster
too!
© Hortonworks Inc. 2014
Kerberos Encapsulation
1. REST API Request
2. HTTP Basic Auth Challengekminder:secret
3. Authenticate kminder:secret
knoxkeytab
4. Authenticates asknox via SPNego
(i.e. Kerberos)
5. REST API RequestdoAs kminder
0. Configure knox as trusted proxy
The client isn’t even aware the
cluster is secured with Kerberos
© Hortonworks Inc. 2014
REST API REST API
Hadoop
REST API Reach: Intranet Access Model
DMZ
DesktopGateway
Users will discover novel
ways to use easily accessible REST
APIs
© Hortonworks Inc. 2014
HTML/JS REST
Hadoop
REST API Reach: Middleware Access Model
Web Tier / DMZ
Browser
“Give the APIs to the Apps”
GatewayAppServer
REST
Most enterprises cannot deal with Kerberos in the
web tier and don’t have CLI access
© Hortonworks Inc. 2014
REST API REST API
Hadoop
REST API Reach: Internet Access Model
DMZ
“Give the APIs to the Everyone”
Gateway
Internet
HaaS vendors are exposing
Hadoop REST APIs to the
internet. What does the API tell these clients to
know about your cluster?
© Hortonworks Inc. 2014
Multi-Cluster Support
Gateway
http://knox:8443/gateway/green/webhdfs/v1 http://knox:8443/gateway/blue/webhdfs/v1
greenProduction
Cluster
blueResearch
Cluster
One hosts, one port for
many clusters
© Hortonworks Inc. 2014
Simplified Client Certificate Management
hdfscert
hivecert
hbasecert
knoxcert
knoxpubkey
hivepubkey
hbasepubkey
hdfspubkey
• User only needs to trust Knox’s cert• Admin only needs to manage multiple keys on Knox hosts
© Hortonworks Inc. 2014
Centralized Control
© Hortonworks Inc. 2014
SCP/SSHLogin Hadoop CLIs
Hadoop
Client Edge Node CLI Access Model
DMZ
Edge NodeDesktop
“Take the Users to the CLI”Limited auditing on edge node
CLI too hard to install on desktops
© Hortonworks Inc. 2014
REST APILogin REST API
Hadoop
Improved auditing and access control
DMZ
DesktopGateway
All activity audited
consistently
Additional authorization
control available
© Hortonworks Inc. 2014
Service Level Authorization
• Control access to services by user, group or IP address
<provider> <role>authorization</role> <name>AclsAuthz</name> <enabled>true</enabled> <param> <name>WEBHDFS.acl</name> <value>*;admin;127.0.0.1</value> </param></provider>
© Hortonworks Inc. 2014
XA Secure Integration
1. REST API Request
0. Distributepolicy
3. REST API Request
Policy Server
Agent
2. Service level authorization decision
Agent integrated as authorization
provider
Policies authored in the
portal and distributed by
the policy server
© Hortonworks Inc. 2014
KNOX-250: SSH Bastion Auditing Functionality
• Community is developing an extension
• Based on Apache MINA SSHD
• Provides administrative SSH access via Knox
• Further centralizes auditing of cluster administration
• https://issues.apache.org/jira/browse/KNOX-250
© Hortonworks Inc. 2014
KNOX-250: SSH Bastion Auditing Functionality
SSHLogin Hadoop CLI
HadoopDMZ
DesktopGateway
All activity audited
consistently
© Hortonworks Inc. 2014
Enterprise Integration
© Hortonworks Inc. 2014
Apache Shiro Authentication Provider
• Apache Shiro is the primary authentication provider for Knox
• Used for both LDAP and Active Directory
• Apache Shiro is a popular JEE and JSE security framework
• Very modular and flexible architecture
• Many community extensions
• Integrated into Knox as a servlet filter
© Hortonworks Inc. 2014
Apache Shiro Authentication Provider<provider> <role>authentication</role> <name>ShiroProvider</name> <enabled>true</enabled> <param> <name>main.ldapRealm</name> <value>org.apache.shiro.realm.ldap.JndiLdapRealm</value> </param> <param> <name>main.ldapRealm.userDnTemplate</name> <value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value> </param> <param> <name>main.ldapRealm.contextFactory.url</name> <value>ldap://localhost:33389</value> </param> <param> <name>main.ldapRealm.contextFactory.authenticationMechanism</name> <value>simple</value> </param> <param> <name>urls./**</name> <value>authcBasic</value> </param></provider>
© Hortonworks Inc. 2014
SSO Integration
• Similar in concept Hadoop’s trusted proxy model• Preconfigured for SiteMinder use case• HTTP Headers used to propagate pre-authenticated user and group info• Only acceptable for use in a tightly controlled network environment
<provider> <role>federation</role> <name>HeaderPreAuth</name> <enabled>true</enabled> <param> <name>preauth.validation.method</name> <value>preauth.ip.validation</value> </param> <param> <name>preauth.ip.addresses</name> <value>127.0.*</value> </param></provider>
© Hortonworks Inc. 2014
OAuth 2
• OAuth is becoming the defacto standard for communicating a user’s identity to REST APIs• It allows for explicit authorization by the user for the application to
access resources• It has a number of ways to represent the user and authentication
information to go over the wire• JSON Web Token (JWT) is an emerging standard for representing the
various claims, attributes and scopes of an identity• Can be used as a bearer token, URL parameter or Header
• OAuth is also gaining popularity as a federation token for SSO integrations
© Hortonworks Inc. 2014
KNOX-393: OAuth Resource Provider
• Community investigating OAuth Federation Provider extension • Considering Apache Oltu• Warning: Diagram dramatically oversimplified• There are a number of other potential flows
2. REST API RequestAuthorization: Bearer <token>
3. validateAccessToken(<token>)
4. Authenticates asknox via SPNego
(i.e. Kerberos)
5. REST API RequestdoAs kminder
0. Configure knox user to be known as trusted proxy
1. requestAccessToken(JWT)return Bearer token
kminder
© Hortonworks Inc. 2014
What is next for Knox?Jira Assignee Description
KNOX-393: OAuth Resource Provider for Middleware and Application Integration
COMMUNITY OAuth 2 federation provider potentially based on Apache Oltu for external application SSO to Knox and Hadoop
KNOX-355: Support Knox Authentication Provider based on Hadoop Auth Module (SPNEGO)
KNOX Team SPNEGO authentication support for Knox clients
KNOX-250: SSH Bastion Auditing Functionality COMMUNITY SSH tunneling and auditing functionality in addition to REST gateway services.
KNOX-353: Support Hadoop Java Client URLs KNOX Team In order to be used Hadoop CLIs that can use REST, we need to support the expected URLs. This is in addition to the extended URLs for multiple Hadoop cluster support by Knox.
KNOX-242: LDAP Authentication Enhancements
KNOX Team Search attribute based authentication rather than simple LDAP bind.
KNOX-74: Support YARN REST API KNOX Team Add support for the YARN REST API
KNOX-66: Support Ambari REST API access via the Gateway
KNOX Team Add support for the Ambari REST API
TBD TBD What is important to you?
© Hortonworks Inc. 2014
Interested?
• We’re hiring!• http://hortonworks.com/careers/open-positions/
• Especially hands on platform level development experience with • Kerberos• LDAP• OAuth• SAML• JAAS/GSS-API• Crypto
Top Related