Hello and welcome to this online, self-paced course titled ... · In the Introduction to the Hadoop...
Transcript of Hello and welcome to this online, self-paced course titled ... · In the Introduction to the Hadoop...
Hello and welcome to this online, self-paced course titled Administering and Managing the
Oracle Big Data Appliance (BDA). This course contains several lessons. This lesson is titled
Securing the Oracle BDA. My name is Lauran Serhal. I am a curriculum developer at Oracle
and I have helped educate customers on Oracle products since 1995. I'll be guiding you
through this course, which consists of lectures, demos, and review sessions.
The goal of this lesson is to describe how to secure data on the Oracle Big Data Appliance.
Securing the Oracle BDA - 1
Introduction
Before we begin, take a look at some of the features of this course player. If you’ve viewed a
similar self-paced course in the past, feel free to skip this slide.
Menu
This is the Menu tab. It’s set up to automatically progress through the course in a linear
fashion, but you can also review the material in any order. Just click a slide title in the outline
to display its contents.
Notes
Click the Notes tab to view the audio transcript for each slide.
Search
Use the Search field to find specific information in the course.
Player Controls
Use these controls to pause, play, or move to the previous or next slide. Use the interactive
progress bar to fast forward or rewind the current slide. Some interactive slides in this course
may contain additional navigation and controls. The view for certain slides may change so
that you can see additional details.
Resources (Optional)
Click the Resources button to access any attachments associated with this course.
Glossary (Optional)
Click the Glossary button to view key terms and their definitions.
Securing the Oracle BDA - 2
So, you know the title of the course, but you may be asking yourself, “Is this the right course
for me?” Click the bars to learn about the course objectives, target audience, and
prerequisites.
Securing the Oracle BDA - 3
What can you expect to get out of this course? Here are the core learning objectives.
After completing this course, you should be able to define the Hadoop ecosystem and its
components including Hadoop’s Distributed File System (HDFS), MapReduce, Spark, YARN,
and some other related projects. You will also learn how to complete the BDA Site Checklists,
run the Oracle BDA Configuration Utility, and install the Oracle BDA Mammoth software on
the Oracle BDA. You also learn about how to secure data on the Oracle BDA, and how to use
the Oracle Big Data Connectors.
Securing the Oracle BDA - 4
Who is this course for? Here is the intended audience.
• Application Developers
• Database Administrators
• Hadoop/Big Data Cluster Administrators
• Hadoop Programmers
Securing the Oracle BDA - 5
Before taking this course, you should have some exposure to Big Data, and optionally some
basic database knowledge.
Securing the Oracle BDA - 6
In this course, we'll talk about the following lessons:
In the Introduction to the Hadoop Ecosystem, you define the Hadoop ecosystem and describe the Hadoop core components and some of the related projects. You will also learn about the components of HDFS and review MapReduce, Spark, and YARN.
In the Introduction to the Oracle BDA lesson, you identify the Oracle Big Data Appliance (BDA) and its hardware and software components.
In the Oracle BDA Pre-Installation Steps lesson, you learn how to download and complete the BDA Site Checklists. You also learn how to download and run the Oracle BDA Configuration Utility and then review the generated configuration files.
In the Working With Mammoth lesson, you learn how to download the Oracle BDA Mammoth Software Deployment Bundle from My Oracle Support. You also learn how to install a CDH or NoSQL cluster based on your specifications. You then learn how to install the Oracle BDA Mammoth Software Deployment Bundle using the Mammoth utility.
In the Securing the Oracle BDA lesson, you learn how to secure data on the Oracle Big Data Appliance.
In the Working With the Oracle Big Data Connectors lessons, you learn how to use Oracle SQL Connector for Hadoop Distributed File System, Oracle Loader for Hadoop, Oracle Data Integrator, Oracle XQuery for Hadoop, and Oracle R Advanced Analytics for Hadoop.
Securing the Oracle BDA - 7
Now that you’ve learned about the other Oracle Big Data Connectors, let’s take a look at how
to secure data on the Oracle Big Data Appliance. Let's get started.
Securing the Oracle BDA - 8
• Security is a calculated decision. Because security does not make systems run faster or
make them easier to use, the cost of implementing any security control must balance the
potential cost of not implementing any security procedures or practices.
• How much security is required is often a judgment made on the amount of risk an
organization is willing to assume. That risk often boils down to the value of the data. If
the data being housed in your BDA is a unique combination of sensor data, geological
data, and periodical feeds from gold mining journals, it has very different security
requirements than a cluster housing publicly available home values, traffic accident
data, and citizen demographic data.
Securing the Oracle BDA - 9
You will explore the key capabilities that support this security spectrum, including:
Authentication:
The subject (user, program, process, service) proves their identity to gain access to the
system.
Authorization:
Authenticated subjects are granted access to authorized resources.
Auditing:
Accesses and manipulations are recorded in logs for auditing/accountability/compliance.
Encryption (at rest and over the network):
It protects against man-in-the-middle attack during transit, and data breach on data at rest.
Securing the Oracle BDA - 11
• Hadoop client: Local OS user ID is used.
• JDBC: Specify any user as part of the connect string.
• Oracle Database: The user is the owner of the Oracle process.
Securing the Oracle BDA - 13
In this example, when you issue the hadoop fs -ls command, you get an HDFS file listing
for the HDFS hr and marketing directories.
Column 1 shows the file mode: "d" for a directory and a hyphen for a normal file, followed by
the file or directory permissions. HDFS has a permission model similar to that used in Linux. The three permission types are read (r), write (w), and execute (x). The execute permission
for a file is ignored because you cannot execute a file on HDFS. The permissions are grouped
by owner, group, and public (everyone else).
Column 2 shows the replication factor for files. The concept of replication factor does not
apply to directories.
Columns 3 and 4 show the file owner and group.
Column 5 shows the size of the file, in bytes, or 0 if it is a directory.
Columns 6 and 7 show the date and time of the last modification, respectively.
Column 8 is the name of the file or directory.
Securing the Oracle BDA - 15
In the first code example, we use the hadoop fs -ls command to review the permissions
for the sgtpepper.txt file.
In the second code example, we set the sgtpepper.txt file access permissions by using
the hadoop fs –chmod 666 sgtpepper.txt. This allows all users to read and write that
file. Next, we confirm the new access permission on the file.
In the third code example, we make lennon and thebeatles the owners of the file by using
the hadoop fs -chown command. Note that this must be done as a superuser. Next, we
confirm the new ownership changes.
Securing the Oracle BDA - 16
• Trust-based model
• Not intended to prevent users from malicious behavior
• HDFS ACLs provide a level of authorization—but can be circumvented.
- Prevents well-intentioned users from potentially incorrect actions
• The lack of strong authentication with the relaxed security model causes many
problems:
- User Impersonation: Masquerade as someone else
- Group Impersonation: Masquerade as a member of a group with greater access
- Service Impersonation: Masquerade as a service such as becoming a fake
NameNode or a fake DataNode
Securing the Oracle BDA - 17
The slide bullets span several of the levels mentioned in the “Security Levels” slide.
Mammoth automates the setup of a secure cluster as follows:
• Installs and configures Kerberos for strong authentication
• Installs and configures Sentry to manage authorization
• Configures auditing with Oracle Audit Vault
• Configures encryption
Securing the Oracle BDA - 18
Kerberos provides a secure way of ensuring the identity of users and services communicating
over the network. It ensures that users are who they claim to be, and that the services are not
imposters. Instead of sending passwords over the network, encrypted, time-stamped tickets
are used to gain access to services. The KDC is responsible for providing these tickets.
The Key Distribution Center (KDC) holds all user and service cryptographic keys. The KDC
provides security services to entities referred to as principals. The KDC and each principal
share a secret key. The KDC uses the secret key to encrypt data, sends it to the principal, and
the principal uses the secret key to decrypt and process the data. There are two categories of
principles:
• User principals: Users that are accessing machines and services
• Service principals: Services that are running on the system, such as HDFS, YARN, and
so on
The KDC has two components:
• The Authentication Server (AS)
• The Ticket Granting Server (TGS)
Securing the Oracle BDA - 19
The diagram on this page explains the steps that are required by a client to access services
such as the NameNode, DataNodes, and HDFS when using Kerberos:
1. The client authenticates itself to the Authentication Server (AS). The AS grants a
timestamped Ticket-Granting Ticket (TGT) to the requesting client.
2. The client uses the TGT to request a service ticket from the Ticket-Granting Server (TGS).
3. The TGS grants the client a TGT.
4. The client uses the service ticket to authenticate itself to the server providing the service
that the client is requesting. In the case of Hadoop, this might be the HDFS, the Active
NameNode, the ResourceManager, or YARN.
5. The users for the KDC can be living in the KDC in a database or in an external store such
as LDAP which is a centralized user management such as users or groups. LDAP is an
optional component and is recommended such as Microsoft active directory or Oracle
unified directory. The user and the service to which you are connecting will be
authenticated.
The KDC on this page represents an existing KDC that you might already have installed and
configured. If you don't have an existing KDC, then Mammoth on the BDA can install and
configure the KDC for you.
Securing the Oracle BDA - 20
Setting up all of the service principals required for the Hadoop cluster has been automated by
Mammoth. In the slide, you can see service principals for HDFS, Hue, YARN, and more. Note
that it is not enough to simply specify the service name; the instance of the service (which
includes the host where the service runs) is also part of the principal name. You can view the list of principals by using the Kerberos Admin tool, kadmin.local.
In this example, we logged in to the KDC by using kadmin.local, and then listed the
principals by using the listprincs command.
Let's look at one of the service principals. The components of the hue service principal is the service name, hue, the host where this service runs, scaj51bda12.us.oracle.com, and
the realm, DEV.ORACLE.COM. The realm name is always uppercase.
Securing the Oracle BDA - 21
As we show here, when using Kerberos for authentication, HiveServer2 is no longer
impersonating.
Securing the Oracle BDA - 22
Let's look at two user authentication examples.
In the first example, the oracle user attempts to access HDFS without authentication by
using the hadoop fs -ls command. This command fails because no valid credentials were
provided in the form of a Kerberos ticket-granting ticket (TGT) .
In the second example, the oracle user initiates an authentication request by using the
kinit command, which obtains and caches an initial TGT for the principal user, oracle.
The TGT is required before the oracle user can access the HDFS service. This example
pairs the current Linux username, oracle, with the default realm to come up with the
suggested [email protected] principal. The AS responds by providing a TGT that
is encrypted with the key (password) for the [email protected] principal. Upon
receipt of the encrypted message, oracle is prompted to enter the correct password for the
[email protected] principal to decrypt the message. After successfully decrypting
the message containing the TGT, oracle requests a service ticket from the TGS for the
HDFS service presenting the TGT along with the request. The TGS validates the TGT and provides oracle with a service ticket encrypted with the principal’s key.
oracle presents the service ticket to the HDFS service, which can then decrypt it and
validate the ticket. oracle can now use the HDFS service because the user is properly
authenticated. The HDFS hadoop fs -ls command runs successfully.
Securing the Oracle BDA - 23
A keytab is a file containing pairs of Kerberos principals and an encrypted copy of that
principal's key. The keytab files are unique to each host because their keys include the host
name. This file is used to authenticate a principal on a host to Kerberos without human
interaction or without storing a password in a plain text file. Because having access to the
keytab file for a principal allows one to act as that principal, access to the keytab files should
be tightly secured. The files should be readable by a minimal set of users, should be stored
on local disk, and should not be included in machine backups, unless access to those
backups is as secure as access to the local machine.
Keytab files are located in the /var/run/cloudera-scm-agent/process/<process>
directory.
Services do not prompt for passwords. You can create a Kerberos keytab file by using the ktutil command. You can use a keytab file, which stores passwords, and supply it to the
kinit command with the –t option.
Securing the Oracle BDA - 24
A credential cache (or “ccache”) holds Kerberos credentials while they remain valid and,
generally, while the user’s session lasts, so that authenticating to a service multiple times
(e.g., connecting to a web or mail server more than once) doesn’t require contacting the KDC
every time.
Securing the Oracle BDA - 25
Cloudera Manager enables administrators to specify TGT refresh policies as seen in this
screen. To access the Settings page, start Cloudera Manager, click the Administration
drop-down list, and then select Settings.
Securing the Oracle BDA - 26
To add a new user with group membership:
• Add the user’s principal to the KDC
• Add the user to each critical BDA node
- User does not need login privileges
- Assign user to group(s)
Note: Hue maintains its own users and follows a different process.
Securing the Oracle BDA - 27
The useradd command adds a specific user to the system. There are a number of options
for this command. The -r option creates a user system account. The -g option assigns a
group name or number as the user’s initial login group. The group must already exist; you
cannot create the group as you add the user.
In the first code example, we add the user bob to the marketing group and the user lucy to
the hr group.
In the second code example, we add the two new users' principals to the KDC by using the addprinc command followed by the user's principal. The principal for user bob is
[email protected] and the principle for user lucy is [email protected].
Securing the Oracle BDA - 28
The User Admin application enables a superuser to add, delete, and manage Hue users and
groups, and configure group permissions. Superusers can add users and groups individually,
or import them from an LDAP directory. Group permissions define the Hue applications that
are visible to group members when they log in to Hue, as well as the application features that
are available to them.
To create a user in Hue:
1. Access the Hue Web UI, and then select Manage users from the Administration icon on
the toolbar.
2. On the User Admin page, click Add user.
3. The Hue Users – Create user wizard is displayed. Complete the three wizard steps, and
then click Add user to create the user. The user and group updates:
• Only impacts access through Hue.
• Hue impersonates the user when accessing Hadoop services.
• It does not require updates to KDC.
Securing the Oracle BDA - 29
Apache Sentry (incubating) is a granular, role-based authorization module for Hadoop. Sentry
provides the ability to control and enforce precise levels of privileges on data for authenticated
users and applications on a Hadoop cluster. Sentry currently works out of the box with
Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala and HDFS (limited to Hive table
data). Sentry is designed to be a pluggable authorization engine for Hadoop components. It
allows you to define authorization rules to validate a user or application’s access requests for
Hadoop resources. Sentry is highly modular and can support authorization for a wide variety
of data models in Hadoop.
Without Sentry, you can query everything in Hive. This applies to Hive's Metastore.
Securing the Oracle BDA - 30
Sentry provides the following benefits:
• Secure Authorization: The ability to control access to data and/or privileges on data for
authenticated users
• Fine-grained Authorization: The ability to give users access to a subset of data. This
includes access to a database, URI, table, or view
• Role-based Authorization: The ability to create or apply template-based privileges based
on functional roles
Securing the Oracle BDA - 31
As part of the Sentry configuration, HiveServer2 impersonation must be disabled.
• All data access is executed by the Hive user.
• Changes will need to be made to the HDFS privilege model to effectively authorize
access to data.
• Without changes to this model, all users accessing the Hive data would need to be part
of the Hive group—rendering authorization meaningless.
• This will be covered later in this lesson.
Securing the Oracle BDA - 32
There are users, group, and roles.
Users are who you authenticate as. You are part of the group. A group has a collection of
users. A role is a collection of privileges. Roles are assigned to groups.
Securing the Oracle BDA - 33
The example on the next few slides illustrates the use of Sentry to authorize access to four
user segments:
• The administrator might have access to all the data.
• The HR team can only access the hr data.
• The Marketing team can only access the marketing data.
• The development team can access all of the data.
Securing the Oracle BDA - 34
The table shows the users, groups, roles, and capabilities.
Users are who you authenticate as. You are part of the group. A group has a collection of
users. A role is a collection of privileges. Roles are assigned to groups.
Securing the Oracle BDA - 35
The newly created roles have not yet been granted any privileges. In our example, the admin
user creates databases for each group and gives the appropriate roles full access to create
and grant access to objects in the respective databases. Because development group has
both roles assigned, it has full access to both databases.
The GRANT OPTION privilege enables you to give to other users or remove from other users
those privileges that you yourself possess. The hr user can grant access to other users to
elements or objects in the hr database.
Securing the Oracle BDA - 39
As part of Sentry’s configuration, Hive impersonation is turned off. Therefore, the Hive
superuser group must be able to access the underlying data.
You also want other users and services to be able to access that data and not just Hive.
In our example, three groups require access to the underlying data files. Simple access
privileges are insufficient.
• You must be able to specify privileges for each group (which may differ).
• You must keep ACLs in sync with Sentry authorization.
Securing the Oracle BDA - 40
Although the marketing group owns the data in HDFS, Sentry must still authorize access to
the URI.
Securing the Oracle BDA - 42
Once the privileges are all set, you can create the table and load the data into that table.
Securing the Oracle BDA - 43
See Cloudera documentation by using the following url for the complete list of the available
privileges:
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_sg_sentry
_service.html#concept_cx4_sw2_q4_unique_1
Securing the Oracle BDA - 45
Hadoop client: The local OS user ID is used.
JDBC: Specify any user as part of the connect string.
Oracle Database: The user is the owner of the Oracle process.
Securing the Oracle BDA - 48
Oracle Virtual Private Database (VPD) enables you to create policies to restrict the data accessible to users. Essentially, Oracle Virtual Private Database adds a dynamic WHERE
clause to a SQL statement that is issued against the table, view, or synonym to which an
Oracle Virtual Private Database security policy was applied.
In this example, a policy was created that automatically adds a filter based on session
information—specifically the ID of the user. The query result is automatically filtered to show rows where the SALES_REP_ID column equals this user ID. Note that it does not matter if the
data is stored in HDFS or Oracle Database, the same policies are applied.
Securing the Oracle BDA - 49
Similar to VPD, you can leverage Oracle Data Redaction to data stored in both Oracle Database and Big Data sources. In this example, the USERNAME column in the TWEET table is
being redacted; the first seven characters in the name are being replaced by “*”. You can
control when to apply the redaction policy. Here, redaction always takes place because the expression “1=1” is always true.
Note that redaction is applied to data at runtime—when users access the data. This is an
important distinction as it allows table joins across different data stores. For example, let us say you had a CUSTOMER table that also contained the USERNAME field. Just like in the TWEET
table, imagine that the USERNAME is redacted. To successfully join the USERNAME columns of
these two tables, the processing uses the natural values of the columns. The USERNAME will
then be encrypted as part of the query result.
Securing the Oracle BDA - 50
The Unauthorized Data Access of DrEvil triggers an alert. The alert details are highlighted
on the screen capture.
Securing the Oracle BDA - 59
Useful for consequences of purging or modifying a set of data entities.
Securing the Oracle BDA - 63
Oracle BDA supports network encryption for key activities—preventing network sniffing
between computers. Mammoth automatically configures:
• [1] Cloudera Manager Server communicating with Agents
• [2] Hadoop HDFS data transfers
• [3] Hadoop internal RPC communications
• [4] Cloudera Manager Web interface
• [5] Hadoop web UIs and Web services
• [6] Hadoop YARN/MapReduce shuffle transfers
Securing the Oracle BDA - 65
In this lesson, you should have learned about securing data on the Oracle Big Data
Appliance.
Securing the Oracle BDA - 67
In this course, we discussed the following lessons:
• Introduction to the Hadoop Ecosystem
• Introduction to the Oracle BDA
• Oracle BDA Pre-Installation Steps
• Working With Mammoth
• Securing the Oracle BDA
• Working With the Oracle Big Data Connectors:
- Oracle SQL Connector for Hadoop Distributed File System (HDFS)
- Oracle Loader for Hadoop (OLH)
- Oracle Data Integrator (ODI)
- Oracle XQuery for Hadoop (OXH)
- Oracle R Advanced Analytics for Hadoop (ORAAH)
Securing the Oracle BDA - 68
The Oracle Learning Library (OLL) offers other self-paced courses about the Oracle Big Data
Appliance and other related topics. Visit the Oracle Learning Library to learn about the
courses.
Securing the Oracle BDA - 69
Oracle University offers In-Class courses about Oracle Big Data and other related topics.
Visit Oracle University to learn about the following courses:
• Oracle Big Data Fundamentals
• XML Fundamentals
• Oracle Database 12c: Use XML DB
• Oracle NoSQL Database for Developers
• Oracle NoSQL Database for Administrators
• Oracle R Enterprise Essentials
Securing the Oracle BDA - 70
The Oracle Learning Library offers many free demonstrations and tutorials.
And, of course, the Oracle Big Data Appliance documentation and online help embedded
within the product are also valuable resources.
Securing the Oracle BDA - 71