Hello and welcome to this online, self-paced course titled ... · In the Introduction to the Hadoop...

73
Hello and welcome to this online, self-paced course titled Administering and Managing the Oracle Big Data Appliance (BDA). This course contains several lessons. This lesson is titled Securing the Oracle BDA. My name is Lauran Serhal. I am a curriculum developer at Oracle and I have helped educate customers on Oracle products since 1995. I'll be guiding you through this course, which consists of lectures, demos, and review sessions. The goal of this lesson is to describe how to secure data on the Oracle Big Data Appliance. Securing the Oracle BDA - 1

Transcript of Hello and welcome to this online, self-paced course titled ... · In the Introduction to the Hadoop...

Hello and welcome to this online, self-paced course titled Administering and Managing the

Oracle Big Data Appliance (BDA). This course contains several lessons. This lesson is titled

Securing the Oracle BDA. My name is Lauran Serhal. I am a curriculum developer at Oracle

and I have helped educate customers on Oracle products since 1995. I'll be guiding you

through this course, which consists of lectures, demos, and review sessions.

The goal of this lesson is to describe how to secure data on the Oracle Big Data Appliance.

Securing the Oracle BDA - 1

Introduction

Before we begin, take a look at some of the features of this course player. If you’ve viewed a

similar self-paced course in the past, feel free to skip this slide.

Menu

This is the Menu tab. It’s set up to automatically progress through the course in a linear

fashion, but you can also review the material in any order. Just click a slide title in the outline

to display its contents.

Notes

Click the Notes tab to view the audio transcript for each slide.

Search

Use the Search field to find specific information in the course.

Player Controls

Use these controls to pause, play, or move to the previous or next slide. Use the interactive

progress bar to fast forward or rewind the current slide. Some interactive slides in this course

may contain additional navigation and controls. The view for certain slides may change so

that you can see additional details.

Resources (Optional)

Click the Resources button to access any attachments associated with this course.

Glossary (Optional)

Click the Glossary button to view key terms and their definitions.

Securing the Oracle BDA - 2

So, you know the title of the course, but you may be asking yourself, “Is this the right course

for me?” Click the bars to learn about the course objectives, target audience, and

prerequisites.

Securing the Oracle BDA - 3

What can you expect to get out of this course? Here are the core learning objectives.

After completing this course, you should be able to define the Hadoop ecosystem and its

components including Hadoop’s Distributed File System (HDFS), MapReduce, Spark, YARN,

and some other related projects. You will also learn how to complete the BDA Site Checklists,

run the Oracle BDA Configuration Utility, and install the Oracle BDA Mammoth software on

the Oracle BDA. You also learn about how to secure data on the Oracle BDA, and how to use

the Oracle Big Data Connectors.

Securing the Oracle BDA - 4

Who is this course for? Here is the intended audience.

• Application Developers

• Database Administrators

• Hadoop/Big Data Cluster Administrators

• Hadoop Programmers

Securing the Oracle BDA - 5

Before taking this course, you should have some exposure to Big Data, and optionally some

basic database knowledge.

Securing the Oracle BDA - 6

In this course, we'll talk about the following lessons:

In the Introduction to the Hadoop Ecosystem, you define the Hadoop ecosystem and describe the Hadoop core components and some of the related projects. You will also learn about the components of HDFS and review MapReduce, Spark, and YARN.

In the Introduction to the Oracle BDA lesson, you identify the Oracle Big Data Appliance (BDA) and its hardware and software components.

In the Oracle BDA Pre-Installation Steps lesson, you learn how to download and complete the BDA Site Checklists. You also learn how to download and run the Oracle BDA Configuration Utility and then review the generated configuration files.

In the Working With Mammoth lesson, you learn how to download the Oracle BDA Mammoth Software Deployment Bundle from My Oracle Support. You also learn how to install a CDH or NoSQL cluster based on your specifications. You then learn how to install the Oracle BDA Mammoth Software Deployment Bundle using the Mammoth utility.

In the Securing the Oracle BDA lesson, you learn how to secure data on the Oracle Big Data Appliance.

In the Working With the Oracle Big Data Connectors lessons, you learn how to use Oracle SQL Connector for Hadoop Distributed File System, Oracle Loader for Hadoop, Oracle Data Integrator, Oracle XQuery for Hadoop, and Oracle R Advanced Analytics for Hadoop.

Securing the Oracle BDA - 7

Now that you’ve learned about the other Oracle Big Data Connectors, let’s take a look at how

to secure data on the Oracle Big Data Appliance. Let's get started.

Securing the Oracle BDA - 8

• Security is a calculated decision. Because security does not make systems run faster or

make them easier to use, the cost of implementing any security control must balance the

potential cost of not implementing any security procedures or practices.

• How much security is required is often a judgment made on the amount of risk an

organization is willing to assume. That risk often boils down to the value of the data. If

the data being housed in your BDA is a unique combination of sensor data, geological

data, and periodical feeds from gold mining journals, it has very different security

requirements than a cluster housing publicly available home values, traffic accident

data, and citizen demographic data.

Securing the Oracle BDA - 9

Securing the Oracle BDA - 10

You will explore the key capabilities that support this security spectrum, including:

Authentication:

The subject (user, program, process, service) proves their identity to gain access to the

system.

Authorization:

Authenticated subjects are granted access to authorized resources.

Auditing:

Accesses and manipulations are recorded in logs for auditing/accountability/compliance.

Encryption (at rest and over the network):

It protects against man-in-the-middle attack during transit, and data breach on data at rest.

Securing the Oracle BDA - 11

Securing the Oracle BDA - 12

• Hadoop client: Local OS user ID is used.

• JDBC: Specify any user as part of the connect string.

• Oracle Database: The user is the owner of the Oracle process.

Securing the Oracle BDA - 13

Securing the Oracle BDA - 14

In this example, when you issue the hadoop fs -ls command, you get an HDFS file listing

for the HDFS hr and marketing directories.

Column 1 shows the file mode: "d" for a directory and a hyphen for a normal file, followed by

the file or directory permissions. HDFS has a permission model similar to that used in Linux. The three permission types are read (r), write (w), and execute (x). The execute permission

for a file is ignored because you cannot execute a file on HDFS. The permissions are grouped

by owner, group, and public (everyone else).

Column 2 shows the replication factor for files. The concept of replication factor does not

apply to directories.

Columns 3 and 4 show the file owner and group.

Column 5 shows the size of the file, in bytes, or 0 if it is a directory.

Columns 6 and 7 show the date and time of the last modification, respectively.

Column 8 is the name of the file or directory.

Securing the Oracle BDA - 15

In the first code example, we use the hadoop fs -ls command to review the permissions

for the sgtpepper.txt file.

In the second code example, we set the sgtpepper.txt file access permissions by using

the hadoop fs –chmod 666 sgtpepper.txt. This allows all users to read and write that

file. Next, we confirm the new access permission on the file.

In the third code example, we make lennon and thebeatles the owners of the file by using

the hadoop fs -chown command. Note that this must be done as a superuser. Next, we

confirm the new ownership changes.

Securing the Oracle BDA - 16

• Trust-based model

• Not intended to prevent users from malicious behavior

• HDFS ACLs provide a level of authorization—but can be circumvented.

- Prevents well-intentioned users from potentially incorrect actions

• The lack of strong authentication with the relaxed security model causes many

problems:

- User Impersonation: Masquerade as someone else

- Group Impersonation: Masquerade as a member of a group with greater access

- Service Impersonation: Masquerade as a service such as becoming a fake

NameNode or a fake DataNode

Securing the Oracle BDA - 17

The slide bullets span several of the levels mentioned in the “Security Levels” slide.

Mammoth automates the setup of a secure cluster as follows:

• Installs and configures Kerberos for strong authentication

• Installs and configures Sentry to manage authorization

• Configures auditing with Oracle Audit Vault

• Configures encryption

Securing the Oracle BDA - 18

Kerberos provides a secure way of ensuring the identity of users and services communicating

over the network. It ensures that users are who they claim to be, and that the services are not

imposters. Instead of sending passwords over the network, encrypted, time-stamped tickets

are used to gain access to services. The KDC is responsible for providing these tickets.

The Key Distribution Center (KDC) holds all user and service cryptographic keys. The KDC

provides security services to entities referred to as principals. The KDC and each principal

share a secret key. The KDC uses the secret key to encrypt data, sends it to the principal, and

the principal uses the secret key to decrypt and process the data. There are two categories of

principles:

• User principals: Users that are accessing machines and services

• Service principals: Services that are running on the system, such as HDFS, YARN, and

so on

The KDC has two components:

• The Authentication Server (AS)

• The Ticket Granting Server (TGS)

Securing the Oracle BDA - 19

The diagram on this page explains the steps that are required by a client to access services

such as the NameNode, DataNodes, and HDFS when using Kerberos:

1. The client authenticates itself to the Authentication Server (AS). The AS grants a

timestamped Ticket-Granting Ticket (TGT) to the requesting client.

2. The client uses the TGT to request a service ticket from the Ticket-Granting Server (TGS).

3. The TGS grants the client a TGT.

4. The client uses the service ticket to authenticate itself to the server providing the service

that the client is requesting. In the case of Hadoop, this might be the HDFS, the Active

NameNode, the ResourceManager, or YARN.

5. The users for the KDC can be living in the KDC in a database or in an external store such

as LDAP which is a centralized user management such as users or groups. LDAP is an

optional component and is recommended such as Microsoft active directory or Oracle

unified directory. The user and the service to which you are connecting will be

authenticated.

The KDC on this page represents an existing KDC that you might already have installed and

configured. If you don't have an existing KDC, then Mammoth on the BDA can install and

configure the KDC for you.

Securing the Oracle BDA - 20

Setting up all of the service principals required for the Hadoop cluster has been automated by

Mammoth. In the slide, you can see service principals for HDFS, Hue, YARN, and more. Note

that it is not enough to simply specify the service name; the instance of the service (which

includes the host where the service runs) is also part of the principal name. You can view the list of principals by using the Kerberos Admin tool, kadmin.local.

In this example, we logged in to the KDC by using kadmin.local, and then listed the

principals by using the listprincs command.

Let's look at one of the service principals. The components of the hue service principal is the service name, hue, the host where this service runs, scaj51bda12.us.oracle.com, and

the realm, DEV.ORACLE.COM. The realm name is always uppercase.

Securing the Oracle BDA - 21

As we show here, when using Kerberos for authentication, HiveServer2 is no longer

impersonating.

Securing the Oracle BDA - 22

Let's look at two user authentication examples.

In the first example, the oracle user attempts to access HDFS without authentication by

using the hadoop fs -ls command. This command fails because no valid credentials were

provided in the form of a Kerberos ticket-granting ticket (TGT) .

In the second example, the oracle user initiates an authentication request by using the

kinit command, which obtains and caches an initial TGT for the principal user, oracle.

The TGT is required before the oracle user can access the HDFS service. This example

pairs the current Linux username, oracle, with the default realm to come up with the

suggested [email protected] principal. The AS responds by providing a TGT that

is encrypted with the key (password) for the [email protected] principal. Upon

receipt of the encrypted message, oracle is prompted to enter the correct password for the

[email protected] principal to decrypt the message. After successfully decrypting

the message containing the TGT, oracle requests a service ticket from the TGS for the

HDFS service presenting the TGT along with the request. The TGS validates the TGT and provides oracle with a service ticket encrypted with the principal’s key.

oracle presents the service ticket to the HDFS service, which can then decrypt it and

validate the ticket. oracle can now use the HDFS service because the user is properly

authenticated. The HDFS hadoop fs -ls command runs successfully.

Securing the Oracle BDA - 23

A keytab is a file containing pairs of Kerberos principals and an encrypted copy of that

principal's key. The keytab files are unique to each host because their keys include the host

name. This file is used to authenticate a principal on a host to Kerberos without human

interaction or without storing a password in a plain text file. Because having access to the

keytab file for a principal allows one to act as that principal, access to the keytab files should

be tightly secured. The files should be readable by a minimal set of users, should be stored

on local disk, and should not be included in machine backups, unless access to those

backups is as secure as access to the local machine.

Keytab files are located in the /var/run/cloudera-scm-agent/process/<process>

directory.

Services do not prompt for passwords. You can create a Kerberos keytab file by using the ktutil command. You can use a keytab file, which stores passwords, and supply it to the

kinit command with the –t option.

Securing the Oracle BDA - 24

A credential cache (or “ccache”) holds Kerberos credentials while they remain valid and,

generally, while the user’s session lasts, so that authenticating to a service multiple times

(e.g., connecting to a web or mail server more than once) doesn’t require contacting the KDC

every time.

Securing the Oracle BDA - 25

Cloudera Manager enables administrators to specify TGT refresh policies as seen in this

screen. To access the Settings page, start Cloudera Manager, click the Administration

drop-down list, and then select Settings.

Securing the Oracle BDA - 26

To add a new user with group membership:

• Add the user’s principal to the KDC

• Add the user to each critical BDA node

- User does not need login privileges

- Assign user to group(s)

Note: Hue maintains its own users and follows a different process.

Securing the Oracle BDA - 27

The useradd command adds a specific user to the system. There are a number of options

for this command. The -r option creates a user system account. The -g option assigns a

group name or number as the user’s initial login group. The group must already exist; you

cannot create the group as you add the user.

In the first code example, we add the user bob to the marketing group and the user lucy to

the hr group.

In the second code example, we add the two new users' principals to the KDC by using the addprinc command followed by the user's principal. The principal for user bob is

[email protected] and the principle for user lucy is [email protected].

Securing the Oracle BDA - 28

The User Admin application enables a superuser to add, delete, and manage Hue users and

groups, and configure group permissions. Superusers can add users and groups individually,

or import them from an LDAP directory. Group permissions define the Hue applications that

are visible to group members when they log in to Hue, as well as the application features that

are available to them.

To create a user in Hue:

1. Access the Hue Web UI, and then select Manage users from the Administration icon on

the toolbar.

2. On the User Admin page, click Add user.

3. The Hue Users – Create user wizard is displayed. Complete the three wizard steps, and

then click Add user to create the user. The user and group updates:

• Only impacts access through Hue.

• Hue impersonates the user when accessing Hadoop services.

• It does not require updates to KDC.

Securing the Oracle BDA - 29

Apache Sentry (incubating) is a granular, role-based authorization module for Hadoop. Sentry

provides the ability to control and enforce precise levels of privileges on data for authenticated

users and applications on a Hadoop cluster. Sentry currently works out of the box with

Apache Hive, Hive Metastore/HCatalog, Apache Solr, Impala and HDFS (limited to Hive table

data). Sentry is designed to be a pluggable authorization engine for Hadoop components. It

allows you to define authorization rules to validate a user or application’s access requests for

Hadoop resources. Sentry is highly modular and can support authorization for a wide variety

of data models in Hadoop.

Without Sentry, you can query everything in Hive. This applies to Hive's Metastore.

Securing the Oracle BDA - 30

Sentry provides the following benefits:

• Secure Authorization: The ability to control access to data and/or privileges on data for

authenticated users

• Fine-grained Authorization: The ability to give users access to a subset of data. This

includes access to a database, URI, table, or view

• Role-based Authorization: The ability to create or apply template-based privileges based

on functional roles

Securing the Oracle BDA - 31

As part of the Sentry configuration, HiveServer2 impersonation must be disabled.

• All data access is executed by the Hive user.

• Changes will need to be made to the HDFS privilege model to effectively authorize

access to data.

• Without changes to this model, all users accessing the Hive data would need to be part

of the Hive group—rendering authorization meaningless.

• This will be covered later in this lesson.

Securing the Oracle BDA - 32

There are users, group, and roles.

Users are who you authenticate as. You are part of the group. A group has a collection of

users. A role is a collection of privileges. Roles are assigned to groups.

Securing the Oracle BDA - 33

The example on the next few slides illustrates the use of Sentry to authorize access to four

user segments:

• The administrator might have access to all the data.

• The HR team can only access the hr data.

• The Marketing team can only access the marketing data.

• The development team can access all of the data.

Securing the Oracle BDA - 34

The table shows the users, groups, roles, and capabilities.

Users are who you authenticate as. You are part of the group. A group has a collection of

users. A role is a collection of privileges. Roles are assigned to groups.

Securing the Oracle BDA - 35

Securing the Oracle BDA - 36

Securing the Oracle BDA - 37

Securing the Oracle BDA - 38

The newly created roles have not yet been granted any privileges. In our example, the admin

user creates databases for each group and gives the appropriate roles full access to create

and grant access to objects in the respective databases. Because development group has

both roles assigned, it has full access to both databases.

The GRANT OPTION privilege enables you to give to other users or remove from other users

those privileges that you yourself possess. The hr user can grant access to other users to

elements or objects in the hr database.

Securing the Oracle BDA - 39

As part of Sentry’s configuration, Hive impersonation is turned off. Therefore, the Hive

superuser group must be able to access the underlying data.

You also want other users and services to be able to access that data and not just Hive.

In our example, three groups require access to the underlying data files. Simple access

privileges are insufficient.

• You must be able to specify privileges for each group (which may differ).

• You must keep ACLs in sync with Sentry authorization.

Securing the Oracle BDA - 40

Securing the Oracle BDA - 41

Although the marketing group owns the data in HDFS, Sentry must still authorize access to

the URI.

Securing the Oracle BDA - 42

Once the privileges are all set, you can create the table and load the data into that table.

Securing the Oracle BDA - 43

Securing the Oracle BDA - 44

See Cloudera documentation by using the following url for the complete list of the available

privileges:

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_sg_sentry

_service.html#concept_cx4_sw2_q4_unique_1

Securing the Oracle BDA - 45

Securing the Oracle BDA - 46

Securing the Oracle BDA - 47

Hadoop client: The local OS user ID is used.

JDBC: Specify any user as part of the connect string.

Oracle Database: The user is the owner of the Oracle process.

Securing the Oracle BDA - 48

Oracle Virtual Private Database (VPD) enables you to create policies to restrict the data accessible to users. Essentially, Oracle Virtual Private Database adds a dynamic WHERE

clause to a SQL statement that is issued against the table, view, or synonym to which an

Oracle Virtual Private Database security policy was applied.

In this example, a policy was created that automatically adds a filter based on session

information—specifically the ID of the user. The query result is automatically filtered to show rows where the SALES_REP_ID column equals this user ID. Note that it does not matter if the

data is stored in HDFS or Oracle Database, the same policies are applied.

Securing the Oracle BDA - 49

Similar to VPD, you can leverage Oracle Data Redaction to data stored in both Oracle Database and Big Data sources. In this example, the USERNAME column in the TWEET table is

being redacted; the first seven characters in the name are being replaced by “*”. You can

control when to apply the redaction policy. Here, redaction always takes place because the expression “1=1” is always true.

Note that redaction is applied to data at runtime—when users access the data. This is an

important distinction as it allows table joins across different data stores. For example, let us say you had a CUSTOMER table that also contained the USERNAME field. Just like in the TWEET

table, imagine that the USERNAME is redacted. To successfully join the USERNAME columns of

these two tables, the processing uses the natural values of the columns. The USERNAME will

then be encrypted as part of the query result.

Securing the Oracle BDA - 50

Securing the Oracle BDA - 51

Securing the Oracle BDA - 52

Securing the Oracle BDA - 53

Securing the Oracle BDA - 54

Securing the Oracle BDA - 55

Securing the Oracle BDA - 56

Securing the Oracle BDA - 57

Securing the Oracle BDA - 58

The Unauthorized Data Access of DrEvil triggers an alert. The alert details are highlighted

on the screen capture.

Securing the Oracle BDA - 59

Securing the Oracle BDA - 60

Securing the Oracle BDA - 61

Securing the Oracle BDA - 62

Useful for consequences of purging or modifying a set of data entities.

Securing the Oracle BDA - 63

Securing the Oracle BDA - 64

Oracle BDA supports network encryption for key activities—preventing network sniffing

between computers. Mammoth automatically configures:

• [1] Cloudera Manager Server communicating with Agents

• [2] Hadoop HDFS data transfers

• [3] Hadoop internal RPC communications

• [4] Cloudera Manager Web interface

• [5] Hadoop web UIs and Web services

• [6] Hadoop YARN/MapReduce shuffle transfers

Securing the Oracle BDA - 65

Securing the Oracle BDA - 66

In this lesson, you should have learned about securing data on the Oracle Big Data

Appliance.

Securing the Oracle BDA - 67

In this course, we discussed the following lessons:

• Introduction to the Hadoop Ecosystem

• Introduction to the Oracle BDA

• Oracle BDA Pre-Installation Steps

• Working With Mammoth

• Securing the Oracle BDA

• Working With the Oracle Big Data Connectors:

- Oracle SQL Connector for Hadoop Distributed File System (HDFS)

- Oracle Loader for Hadoop (OLH)

- Oracle Data Integrator (ODI)

- Oracle XQuery for Hadoop (OXH)

- Oracle R Advanced Analytics for Hadoop (ORAAH)

Securing the Oracle BDA - 68

The Oracle Learning Library (OLL) offers other self-paced courses about the Oracle Big Data

Appliance and other related topics. Visit the Oracle Learning Library to learn about the

courses.

Securing the Oracle BDA - 69

Oracle University offers In-Class courses about Oracle Big Data and other related topics.

Visit Oracle University to learn about the following courses:

• Oracle Big Data Fundamentals

• XML Fundamentals

• Oracle Database 12c: Use XML DB

• Oracle NoSQL Database for Developers

• Oracle NoSQL Database for Administrators

• Oracle R Enterprise Essentials

Securing the Oracle BDA - 70

The Oracle Learning Library offers many free demonstrations and tutorials.

And, of course, the Oracle Big Data Appliance documentation and online help embedded

within the product are also valuable resources.

Securing the Oracle BDA - 71

Securing the Oracle BDA - 72

Securing the Oracle BDA - 73