BD RELEASE NOTES - Amazon S3...Metadata propagation to HDFS storage services When referencing a...

BlueData EPIC

EPIC ENTERPRISE GA 2.1.2071EPIC LITE GA 2.1.2071

RELEASE NOTES

Release Notes

ii Copyright © 2016, BlueData Software, Inc. ALL RIGHTS RESERVED.

NoticeBlueData Software, Inc. believes that the information in this publica-

tion is accurate as of its publication date. However, the information is

subject to change without notice. THE INFORMATION IN THIS

PUBLICATION IS PROVIDED “AS IS.” BLUEDATA SOFTWARE,

INC. MAKES NO REPRESENTATIONS OR WARRANTIES OF

ANY KIND WITH RESPECT TO THE INFORMATION IN THIS

PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WAR-

RANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTIC-

ULAR PURPOSE.

Use, copying, or distribution of any BlueData software described in

this publication requires an applicable software license.

For the most up-to-date regulatory document for your product line,

please refer to your specific agreements or contact BlueData Tech-

nical Support at [email protected].

The information in this document is subject to change. This manual is

believed to be complete and accurate at the time of publication and

no responsibility is assumed for any errors that may appear. In no

event shall BlueData Software, Inc. be liable for incidental or conse-

quential damages in connection with or arising from the use of this

manual and its accompanying related materials.

Copyrights and TrademarksPublished February, 2016. Printed in the United States of America.

Copyright 2016 by BlueData Software, Inc. All rights reserved. This

book or parts thereof may not be reproduced in any form without the

written permission of the publishers.

EPIC, EPIC Lite, and BlueData are trademarks of BlueData Software,

Inc. All other trademarks are the property of their respective own-

ers.

Contact InformationBlueData Software, Inc.

3979 Freedom Circle, Suite 850

Santa Clara, California 95054

Email: [email protected]

Website: www.bluedata.com

Release Notes

This manual describes the known issues and workarounds in the

following builds:

• EPIC 2.1 GA (Build 2071)

• EPIC Lite 2.1 GA (Build 2071)

CAUTION: THESE RELEASE NOTES ARE NOT VALID FOR

ANY BUILD OF EPIC OR EPIC LITE OTHER THAN THE

BUILD(S) LISTED HERE.

1Copyright © 2016, BlueData Software, Inc. ALL RIGHTS RESERVED.

BlueData EPIC and EPIC Lite

1.1 - New Features

The listed builds of EPIC and EPIC Lite include the new features

described in this section.

1.1.1 - Installation

Installation and operation as non-root user

EPIC installation by the root user is still a supported configuration,

but EPIC may also now be installed as a non-root user (with

passwordless-sudo privileges). EPIC services will act as the user

account under which they were installed.

Option to allocate server disks between tenant storage and node

storage

Each non-root disk on the Controller and on each added Worker can

be marked for one of two uses: either as additional space for a local

HDFS service providing per-tenant shared storage, or as a backing

store for the filesystems of virtual nodes on that host.

Optional Kerberos protection for tenant storage

If local HDFS is chosen to provide tenant storage, Kerberos

protection for that service is now an install-time option.

Tuning of local HDFS used for tenant storage

If local HDFS is chosen to provide tenant storage, its configuration is

tuned for better performance under workloads typical for Big Data

applications.

Use of Linux cgroups for better service isolation

Local HDFS services (if any) and EPIC services will be placed into

separate cgroups at install time, to avoid resource starvation under

heavy load.

SSL-enabled web UI

At install time, server credentials may optionally be provided in order

to configure the EPIC web UI to be served via https instead of http.

1.1.2 - Site Administration

Improved log management

Most EPIC-related logs are now gathered under the /var/log/

bluedata directory. Many of these logs are handled through rsyslog

and follow syslog-standard rotation/archiving conventions by

2 Copyright © 2016, BlueData Software, Inc. ALL RIGHTS RESERVED.

Release Notes

default. Administrators may edit the syslog configuration to handle

these files as they wish.

Quota enforcement for tenant storage and node storage

A quota may optionally be placed on each tenant's consumption of

the “node storage” space for virtual node filesystems. If a tenant is

using local HDFS for tenant storage, a quota may also optionally be

placed on its tenant storage sandbox (see the Tenant Administration

features below).

TLS/encrypted communication with LDAP or AD server

User authentication configuration now supports providing a

certificate for configuring secure communication with an LDAP or

Active Directory server.

Option to configure CPU over-provisioning

The ratio of virtual CPU resources to physical CPUs can now be set

by the Site Administrator.

Support for Tenant QoS

Each tenant can be configured with a “QoS Multiplier” that increases

the CPU timeshares given to nodes launched in that tenant.

Easier tenant deletion

Tenant DataTaps no longer need to be manually deleted before

deleting a tenant. Tenant deletion will now implicitly delete all

DataTaps of that tenant. Deleting the DataTap by itself does not

affect the storage pointed to by that DataTap. You must still delete

jobs and persistent clusters and unassign tenant users before

deleting a tenant.

1.1.3 - Tenant Administration

Tenant storage sandbox enforcement

The TenantStorage DataTap created for each tenant is now a non-

editable DataTap that identifies a “sandbox” location in the tenant

storage service, specific to that tenant. The Tenant Administrator

may not create DataTaps that point to the tenant storage service

outside of that sandbox.

Read-only support for DataTaps

DataTaps may be marked as read-only so that they reject any write

or delete operations from clients accessing them through the dtap

protocol.

DataTap compatibility with older versions of HDFS

The back-end DataTap HDFS client now has more capability to fall

back and try earlier HDFS protocol versions, such as those used in

older Hadoop distributions. (CDH 4.6/4.7 used as reference targets.)



1.1.4 - Operation

Enhanced performance for virtual node filesystems

Applications that make heavy use of the local filesystem within a

virtual node should see significant performance improvements.

Metadata propagation to HDFS storage services

When referencing a DataTap that points to an HDFS storage service,

use of the Hadoop filesystem API to read/write file metadata such as

permissions, ownership information, block size, or replication factor

will now execute that operation on the backing HDFS filesystem.

Support for multi-node “edge node” services

An “edge node” service selected for a cluster (such as a BI or

visualization tool) can now be a scale-out service formed of multiple

nodes. The number of nodes may be fixed, or it may be user-

selectable at cluster creation time. Whether an edge-node service

has a fixed or variable number of nodes is determined by the App

Store image used to instantiate the service.

Support for restarting cluster services

Individual nodes, or entire clusters, may now be restarted from a

button in the web UI.

HDFS within virtual Hadoop clusters and support of NameNode

HA

In a cluster deployed from the BlueData-provided App Store images,

a complete HDFS service will now be instantiated within the cluster

(if appropriate) and used as its default Hadoop filesystem. The

NameNode of this service will be HA-protected if Cluster HA is

enabled for that cluster.

App Store inventory change

Images for Splunk Hunk 6.3 (as a non-add-on image) and Spark 1.5

have been added to the App Store. The add-on image for Splunk

Hunk 6.3 now requires CDH 5.4 rather than CDH 5.2.

The following App Store images have been updated: CDH 5.4, HDP

2.3, Spark 1.4, CentOS 6.7, and RHEL 6.7.

The following App Store images have been removed: CDH 5.2, HDP

2.2, and Spark 1.3.


Release Notes

1.2 - Resolved Issues

The following issues have been resolved in the listed builds of EPIC

and EPIC Lite.


Syslog logging consumes root disk space (HAATHI-9846)

While any and all logging will of course continue to consume root

disk space, some EPIC-related logging is now split out of /var/log/

messages and is instead recorded in individual log files under the /

var/log/bluedata directory. EPIC log traffic has also been reduced

somewhat and given a syslog-default log rotation behavior.

The handling of EPIC-related syslog messages can be configured by

editing /etc/rsyslog.d/bds.conf, and/or editing the main /etc/

rsyslog.conf file. Log rotation can be configured by editing /etc/

logrotate.d/bds and/or /etc/logrotate.conf. Note that these files only

affect their local host, so if you wish to change log handling on all

hosts then the relevant config files must be changed on all hosts.

On EC2, the EPIC Lite interface becomes unresponsive during

App Store installation (HAATHI-10702)

EPIC Lite no longer auto-installs any App Store images (and

therefore no longer suffers from this load-related issue). When first

connecting to EPIC Lite, the user should install the App Store

image(s) of their choice.

1.2.2 - Operation (CDH with

Cloudera Manager)

Enabling Kerberos for Hadoop in Cloudera Manager fails with

NoClassDefFoundError (HAATHI-11318)

Enabling Kerberos for Hadoop in Cloudera Manager disables the

Impala service (HAATHI-10872)

Enabling Kerberos for Hadoop in Cloudera Manager disables

services listed after Impala (HAATHI-10872)

These issues do not affect the latest version of the CDH 5.4 App

Store image. If you currently have the older version of the image

installed (where the App Store tile is marked Upgrade Available),

then you must upgrade to the newer version in order for future

deployed CDH clusters to have these fixes.

1.2.3 - Operation (HDP with Ambari)

For Ambari/HDP Hue console, copyFromLocal fails in Pig script

(HAATHI-10770)

This issue does not affect the latest version of the HDP 2.3 App

Store image. If you currently have the older version of the image

installed (where the App Store tile is marked Upgrade Available),

then you must upgrade to the newer version in order for future

deployed HDP clusters to have these fixes.



1.3 - Known Issues & Workarounds

This section lists the known issues that are present in the listed

versions of EPIC and EPIC Lite and methods to work around/recover

from these issues.

1.3.1 - Installation

A failed upgrade from 2.0 to 2.1 may not roll back properly if

platform HA is enabled (HAATHI-11558)

If platform HA is enabled, and the upgrade from 2.0 to 2.1 fails (for

whatever reason), then rollback to 2.0 may fail and leave EPIC in a

permanent Lockdown state (displaying a message to contact

BlueData support).

Recovery: Contact BlueData support for assistance in repairing EPIC

and successfully upgrading.


DataNode may fail to start in the case of simultaneous Worker/

Controller host reboot (HAATHI-11512)

If Kerberos-protected local HDFS is in use, then the DataNode on a

Worker host may fail to come up properly in a situation where that

Worker and the Controller host are simultaneously rebooted, due to

an inability to contact the Kerberos KDC.

Recovery: Log into the worker host as root and restart the DataNode

service (service bds-apache-hdfs-datanode restart). A full worker

host reboot would also serve.

DataNode decommission must be followed by worker deletion

(DOC-15)

Although DataNode decommissioning (for the local HDFS service)

and worker deletion are two separate actions in the web UI, the

intent is for worker deletion to always be performed somewhat

promptly after a DataNode is decommissioned. For the time span

between the decommission and the worker deletion, the Services

dashboard will report a failed DataNode on that worker host, and the

tracking of available tenant storage space may be incorrect.

Workaround: None, other than to promptly follow up DataNode

decommission with worker deletion.

Manual Hadoop commands from a non-root shell on the

controller host must use sudo (DOC-13)

Since EPIC now supports installation as a non-root user, Site

Administrators may more commonly need to execute Hadoop

commands from a non-root shell on the EPIC controller host. You

must use sudo to run such commands successfully as non-root.

(This is true regardless of whether EPIC was installed as root.)


Release Notes

Workaround: As an example, sudo -nE -u hdfs hadoop fs -ls / will list the root directory of the local HDFS, if run as a user

that has sudo privileges.

Virtual node launch can time out immediately after adding a

Worker host (HAATHI-11158)

All currently installed App Store images will begin installation on a

newly added Worker host when that host is added to the EPIC

platform. This installation happens in the background after the

Worker host addition completes, and subsequent launches of virtual

nodes on that Worker host will be serialized behind the image

installation. If the Worker host has slow disk I/O (particularly when

using VMs as hosts) and many images are installed, it is possible that

the virtual node launch will time out and cause cluster creation/

expansion to fail.

Workaround: This issue only arises immediately after Worker host

addition. Either waiting several minutes before new virtual node

creation or re-trying the failed operation should be successful. If

your image library is large enough and/or host disks are slow

enough that you encounter this issue, a Site Administrator can lower

the chances of tenant members encountering the issue by leaving

the system in Lockdown for several minutes after adding a Worker.

Executing “service network restart” on a host makes its nodes

inaccessible (HAATHI-11001)

A root user running “service network restart” in the host OS shell of

any EPIC host will destroy necessary components of the virtual

network used by virtual cluster nodes. Any virtual node assigned to

that host will then become inaccessible through the network.

Recovery: On the EPIC Controller host, execute these commands to

restart the management service and rebuild the virtual network

(ideally when the system is in Lockdown mode with no management

operations in progress):

/sbin/stop bds-controller/sbin/start bds-controller

After platform HA failover under load, HA Status is amber

(DOC-10)

If the HA Status service indicator is showing an amber (warning)

color, platform HA may not be able to protect against further failures.

A known cause of this status is when the Pacemaker service fails

during platform HA failovers under high load. This condition can be

identified by executing the following command in a root shell on the

current active EPIC Controller host:

pcs status

If the output of this command shows any Failed actions, then a

Pacemaker failure is the cause of the platform HA warning status.

Recovery: In the root shell of the Controller host, execute the

following command to restore Pacemaker:

crm_resource -P



Uploading a new system storage keytab via the EPIC interface

can cause issues after platform HA has been enabled (HAATHI-

10683)

Uploading a keytab via the EPIC interface will only upload a keytab to

the current Controller host, even if platform HA is enabled. The

keytab will not be automatically uploaded to the Shadow Controller

host. This is not an issue for DataTap keytabs; however, a system

storage keytab must be present on the current Controller host

whenever a new tenant or a new temporary cluster filesystem is

created. Therefore, uploading a new system storage keytab followed

by a failover to the Shadow Controller will make such operations

vulnerable to failure.

Workaround:If uploading a new system storage keytab, and platform

HA has previously been enabled, the new keytab should also be

manually transferred to the /srv/bluedata/keytab/site_admin

directory of the current Shadow Controller host.

1.3.3 - Operation (General)

Interaction between node “root disk” size and node storage is

confusing (DOC-18)

A virtual node has a “root disk” size that is 30 GB by default, but

which can be increased by using a flavor with larger specified disk

size. For the 2.1 release, not all of this “root disk” size is placed in the

node storage, and therefore not all of it counts against any node

storage quota.

Workaround: Subtract 20 GB from a node's "root disk" size to

determine how much space it will take up in node storage.

Cluster resource usage is misleading with standby/arbiter

nodes (HAATHI-10476)

When HBase or YARN Resource Manager HA is active for a cluster,

two of the cluster's Worker nodes will take on the Standby and

Arbiter roles. While these nodes are allocated from among the

worker node count, they will use the designated Master node flavor

rather than the Worker node flavor. An attempt to manually

determine the cluster's resource usage as Worker count * Worker

flavor + Master flavor will lead to an incorrect result in this case.

Workaround: Free and available resources are still tracked

correctly, so the issue is primarily that the display of the cluster's

characteristics is confusing or misleading. The only workaround is to

be aware of the fact that the Standby and Arbiter nodes will use the

Master flavor.

In a VirtualBox host, external network access from virtual nodes

is slow (HAATHI-11243)

For EPIC Lite installed in VirtualBox, virtual node access to external

networks can be noticeably slower than expected. There is currently

no workaround for this issue.


Release Notes

HBase client configuration uses VM hostnames (HAATHI-

10037)

Spark worker UI is not accessible from master page (HAATHI-

10060)

Some URLs in the ResourceManager dashboard use hostnames

instead of IP addresses (HAATHI-10155)

Each virtual node created by EPIC is assigned a host name. These

host names are only known to other virtual nodes within the EPIC

platform. Client computers outside the EPIC platform may wish to

follow web links to services within the virtual nodes, but those clients

cannot resolve links that are based on the virtual node host names

instead of IP addresses.

Workaround: Users must configure the hosts file on the client

computer that will be accessing the virtual nodes in a given cluster.

To update the /etc/hosts file on a Linux system:

1. After creating a cluster, navigate to the Cluster Management

screen.

For each cluster, there will be a purple Hosts File Info icon

(screen) that will provide the template for setting up the /etc/

hosts file.

The template file should look like this:

2. Copy the data from this template to your /etc/hosts file. After a

cluster is deleted, delete these entries from the /etc/hosts file as

well.

To update the /etc/hosts file on a Windows 8/8.1/10 system:

1. Click Start and then type Notepad in the Search box.

2. Right-click the Notepad icon that appears and then select Run

as Administrator.

3. Follow the Windows 7 procedure below, starting at Step 3.

To update the /etc/hosts file on a Windows 7 system:

1. Click Start>All Programs>Accessories.

2. Right click Notepad and select Run as administrator.

3. If applicable, click Continue in the Windows needs your

permission UAC window.

4. In Notepad, click File>Open.

5. In the File Name field, type

C:\Windows\System32\Drivers\etc\hosts, and then

click Open.



To update the /etc/hosts file on a Windows NT/2000/XP system:

1. Click Start>All Programs>Accessories>Notepad.

2. In Notepad, click File>Open.

3. C:\Windows\System32\Drivers\etc\hosts, and then

click Open.

The C:\Windows\System32\Drivers\etc\hosts file is a hidden

file, and you will therefore need to enable the hidden files folder

option as shown below:

NFS DataTaps are not accessible when using EPIC Lite in

VirtualBox (HAATHI-10637)

When using EPIC Lite deployed in a VirtualBox VM, DataTaps for NFS

storage services may be in an error state (red name) or incorrectly

appear to contain no files.

Recovery: Add a bridged network in the Settings of the VirtualBox

VM that is hosting EPIC Lite.

Hive jobs that use DataTap paths may fail with a

SemanticException error (HAATHI-10733)

When Hive creates a table, the location where the table metadata is

stored comes from the Hive configuration parameter fs.defaultFS by

default (which will point to the cluster filesystem). If a Hive job

references DataTap paths outside of the filesystem where the table

metadata is stored, then the job will fail with a SemanticException

error, because Hive enforces that all data sources come from the

same filesystem.

Workaround: Explicitly set the table metadata location to a path on

the same DataTap that you will use for the job inputs and/or outputs,

using the LOCATION clause when creating the table. For example, if

you intend to use the TenantStorage DataTap, you would set the

table metadata location to some path on that DataTap such as:

CREATE TABLE docs (c1 INT, c2 STRING) LOCATION‘dtap://TenantStorage/hive-table-docs’


Release Notes

1.3.4 - Operation (CDH with Cloudera

Manager)

Cloudera Manager reports incorrect values for a node's

resources (DOC-9)

Cloudera Manager accesses the Linux /proc filesystem to determine

the characteristics of the node(s) it is managing. Because container

technology is used to implement virtual nodes, this filesystem reports

information about the host rather than about the individual node,

causing Cloudera Manager to report inflated values for a node's CPU

count, memory, and disk.

Workaround: Use the EPIC web interface to see a node's virtual

hardware configuration (flavor).

Cloudera Manager issues after Master node is restarted

(HAATHI-11289)

If the Master virtual node running Cloudera Manager is restarted

(usually because the underlying host is restarted), Cloudera Manager

and the services it controls may not properly resume. The symptom

is an unresponsive Cloudera Manager dashboard, or a working

dashboard which shows that necessary services are not running.

Recovery: If the Cloudera Manager dashboard is not responding, log

into the Master virtual node and execute these commands:

sudo service cloudera-scm-server-db restartsudo service cloudera-scm-server restart

Then restart the CDH cluster from the Cloudera Manager UI. You

may also need to do such a cluster restart if the dashboard was

responding but certain services were not started.

Hue wizard on CDH warns that no Job Server is running for

Spark (HAATHI-10737)

For Cloudera virtual clusters, the Hue Quick Start Wizard shows a

Spark Editor warning that The app won't work without a running

Job Server.

Workaround: This message is expected. Spark jobs can still be run

on Cloudera virtual clusters if Spark support was selected at cluster

creation time, as documented in the Running Applications Guide.

Note: If the dashboard shows that HDFS services such as a

name node, secondary name node, and data nodes are run-

ning in the cluster, you may stop these services. They are

not needed, and while they are harmless they do consume

some resources.



1.3.5 - Operation (HDP with Ambari)

HDP 2.3 / Ambari cluster services did not come back up after

host reboot (HAATHI-10482, HAATHI-11481, HAATHI-11554)

After rebooting an EPIC Worker host, HDP 2.3 nodes assigned to

that host may indicate (in EPIC or Ambari dashboards) that a service

is not running.

Recovery: Wait 5 minutes to give the service time to be restarted. If

it still appears to be down, go to the Ambari interface and start the

service using the Service Action menu in the top right corner of the

page.

Ambari dashboard does not show YARN or Flume service

metrics (HAATHI-10667)

The Ambari YARN and Flume summary dashboards of an HDP 2.2 +

Ambari cluster will display No data for service metrics.

Workaround: Some metrics are shown on the top-level Ambari

dashboard. Some other metrics can be seen on the cluster's Ganglia

dashboard; in the <Cluster> screen on the EPIC interface, select the

Charts tab, and then click the Ganglia Dashboard link at the bottom

of the page; however there is not currently a complete workaround

for this issue.

1.3.6 - Operation (Spark)

Zeppelin tutorial example fails with file not found exception

(HAATHI-11562)

The first attempt to run the Zeppelin tutorial example may fail with a

message claiming that the file /data/bank-full.csv does not exist. The

file does exist; this behavior appears to be a Zeppelin issue.

Workaround: Run the example again, with different query values.

Usually this will get past the error. If not, then copy the csv file to a

location on a DataTap, change the example to reference that path,

and try again.

Thrift server logs report “No data or no sasl data in the stream”

(HAATHI-11563)

You may observe the “No data or no sasl data in the stream”

message in the thrift server logs in the $SPARK_HOME/logs/

directory; however this does not indicate an issue with the thrift

server functionality. You may ignore this message.

Workaround: None required.

Spark shell reports “Jar not found” on startup (HAATHI-11564)

You may observe the “Jar not found at file:/opt/bluedata/bluedata-

dtap.jar:/opt/bluedata/mysql-connector.jar” message on starting the

Spark shell; however the file does exist and the Spark shell should

function properly. You may ignore this message.

Workaround: None required.


Release Notes

Spark applications will wait indefinitely if no free vCPUs (DOC-

19)

This is a general Spark behavior, but it is worth some emphasis in an

environment where various virtual hardware resources (possibly in

small amounts) can be quickly provisioned for use with Spark: a

Spark application will be stuck in the Waiting state if all vCPUs in the

cluster are already considered to be in-use (by the Spark framework

and other running Spark applications).

Workaround: You can always get more vCPUs by increasing the

cluster Worker count. If this is not desirable, however, you will need

to control the consumption of vCPUs by Spark applications. Two

examples of such control:

• You can restrict the number of vCPUs that any future Spark

application can consume by assigning a value to the

spark.cores.max property in $SPARK_HOME/conf/spark-

defaults.conf and restarting the Spark master (sudo service

spark-master restart).

• By default in Spark 1.5, the thrift server is configured to use 2

vCPUs on the Spark master node. You can reduce this to 1 vCPU

by editing the total-executor-cores argument value in the /etc/

init.d/hive-thriftserver script, and then restarting the thrift server

(sudo service hive-thriftserver restart).


EPIC and EPIC Lite Release Notes 2.1 (02/2016)

This book or parts thereof may not be reproduced in any form with-

out the written permission of the publishers. Printed in the United

States of America. Copyright 2016 by BlueData Software, Inc. All

rights reserved.

Contact Information:

BlueData Software, Inc.

3979 Freedom Circle, Suite 850

Santa Clara, California 95054

Email: [email protected]

Website: www.bluedata.com

BD RELEASE NOTES - Amazon S3...Metadata propagation to HDFS storage services When referencing a...

Documents

Transcript of BD RELEASE NOTES - Amazon S3...Metadata propagation to HDFS storage services When referencing a...