20151027 sahara + manila final

38
MANILA* AND SAHARA*: CROSSING THE DESERT TO THE BIG DATA OASIS Ethan Gafford, Red Hat Jeff Applewhite, NetApp Malini Bhandaru, Intel covering for Weiting Chen

Transcript of 20151027 sahara + manila final

Page 1: 20151027 sahara + manila final

MANILA* AND SAHARA*: CROSSING THE DESERT TO THE BIG DATA OASISEthan Gafford, Red HatJeff Applewhite, NetAppMalini Bhandaru, Intel

covering for Weiting Chen

Page 2: 20151027 sahara + manila final

AGENDA• Introduction

• Sahara Overview

• Manila Overview

• The goal for Sahara and Manila integration

• The approaches•Manila HDFS Driver

•Manila NFS Share Mount

•Manila + NetApp NFS Connector for Hadoop

• Conclusion • Q&A

2Intel NetApp RedHat

Page 3: 20151027 sahara + manila final

Sahara: The ProblemHadoop* (and Spark*, Storm*…) clusters are difficult to configureCommodity hardware is cheap but requires frequent (costly) maintenanceReliable hardware is expensive, and a fixed-size cluster will cause contentionDemand for data processing varies over time within an organizationBaremetal clusters go down, and can be a single point of failureHadoop dev is very difficult without a real cluster

TL;DR: Data processing clusters are harder to provision and maintain than they should be, and it hurts.

3Intel NetApp RedHat

Page 4: 20151027 sahara + manila final

Sahara: The SolutionPut it in a cloud!Then have easy-to-use, standardized interfaces:

● To create clusters (reliably and repeatedly)● To scale clusters● To run data processing jobs● On any popular data processing framework● With sensible defaults that just work● And sophisticated configuration management for expert users

That's OpenStack* Sahara.

4Intel NetApp RedHat

Page 5: 20151027 sahara + manila final

Sahara: The API

5Intel NetApp RedHat

Page 6: 20151027 sahara + manila final

Sahara: Architecture

6Intel NetApp RedHat

Manila

Page 7: 20151027 sahara + manila final

7Intel NetApp RedHat

Manila Overview

Page 8: 20151027 sahara + manila final

Manila Overview

8Intel NetApp RedHat

Page 9: 20151027 sahara + manila final

Manila Share and Access APIsOperation CLI Command Description

Create manila create Create a Manila share of specified size; optional name, availability zone, share type, share network, source snapshot

Delete manila delete Delete an existing Manila share; the manila force-delete command may be required if the Manila share is in an error state

Edit manila metadata Set or unset metadata on a Manila share

List manila list List all Manila shares

Show manila show Show details about a Manila share

Operation CLI Command Description

Allow manila access-allow Allow access to the specified share for the specified access type and value (IP address or IP network address in CIDR notation or Windows user name).

Deny manila access-deny Deny access to the specified share for the specified access type and value (IP address or IP network address in CIDR notation or Windows user name).

List manila access-list List all Manila share access rules

9Intel NetApp RedHat

Page 10: 20151027 sahara + manila final

Manila & SaharaNetApp driver enabled*

10Intel NetApp RedHat

Page 11: 20151027 sahara + manila final

The Goal for Sahara and Manila IntegrationTo support as many as storage backends and protocols in Sahara as possible

11Intel NetApp RedHat

Page 12: 20151027 sahara + manila final

Sahara Data Processing Model in Kilo*

Host

Virtual Cluster

VM1 VM2

Computing Task

HDFS

Computing Task

HDFS

PATTERN 1:Internal HDFS in the

same node

Host

Virtual Cluster

VM1 VM2

Computing Task HDFS

PATTERN 2:Internal HDFS in different nodes

Host

Virtual Cluster

VM1Computin

g Task

Swift*

PATTERN 3:Swift*

Host

12Intel NetApp RedHat

Compute and data reside together in the same instance in your Hadoop cluster.

Compute and data reside in different instances. This is an elastic way to manage Hadoop clusters.

In order to persist data, Sahara supports Swift to stream the data directly.

Page 13: 20151027 sahara + manila final

Sahara Data Processing Model in Liberty* and the futurePATTERN 4:

External HDFS via Manila*

PATTERN 5:Local Storage with Diverse Storage

Backend in Manila

PATTERN 6:NFS

Host

Virtual Cluster

VM1Computin

g Task

Host

Manila ServiceHDFS

Driver

HDFS

Host

Virtual Cluster

VM1

Computing Task

Host

Manila ServiceNFS Driver

(Extensible)

GlusterFS

Local Volume

Host

Virtual Cluster

VM1

Computing Task

NFS

Host

NetApp* Hadoop

NFS Connector

Manila ServiceNFS Driver

This feature will be implemented in Mitaka

13Intel NetApp RedHat

Sahara can support external HDFS by using the HDFS driver in Manila.

Use local storage in Hadoop and remote mount any type of file storage in Manila.

NetApp Hadoop NFS Connector can bring the NFS capability into Hadoop.

Page 14: 20151027 sahara + manila final

Manila HDFS DriverUse Manila HDFS Driver as external storage in Sahara

14Intel NetApp RedHat

Page 15: 20151027 sahara + manila final

Data Node Data Node Data Node

Name Node

Manila*Share

Compute2Compute1 Compute3

VM1 VM2 VM3 VM4

Tenant B

VM5 VM6

HDFS Driver

Use Case: Manila HDFS DriverUse Case

● Use external HDFS either in the same node w/ compute service or in a physical cluster

Rationales For Use● Use Manila HDFS driver to connect with HDFS● Manila would help to create HDFS share

The Advantages● Use existing HDFS cluster● Centralized managing HDFS via Manila

Limitations● Only support non-secured HDFS due to account

management issue between OpenStack and Hadoop

Reference: https://blueprints.launchpad.net/manila/+spec/hdfs-driver

Tenant A

Step1

Step2

Step3

User A

User A User B

HDFS HDFS HDFS

15Intel NetApp RedHat

Page 16: 20151027 sahara + manila final

Enable HDFS Driver in ManilaStep 1: Set up Manila configuration

• /etc/manila/manila.conf• Make sure the login username and

password are correct• Manila service needs to use the

user to login HDFS and create the share folder by individual user

Step 2: Restart Manila Service

Reference: http://docs.openstack.org/developer/manila/devref/hdfs_native_driver.html

16

share_driver = manila.share.drivers.hdfs.hdfs_native.HDFSNativeShareDriverhdfs_namenode_ip = the IP address of the HDFS namenode. Only singlenamenode is supported now.hdfs_namenode_port = the port of the HDFS namenode servicehdfs_ssh_port = HDFS namenode SSH porthdfs_ssh_name = HDFS namenode SSH login namehdfs_ssh_pw = HDFS namenode SSH login password, this parameter is not necessary, if the following hdfs_ssh_private_key is configuredhdfs_ssh_private_key = Path to the HDFS namenode private key to ssh login…

manila.conf example

Intel NetApp RedHat

Page 17: 20151027 sahara + manila final

Add external HDFS as a Data Source in Sahara• Make the user account - “hdfs” has been set up in HDFS side• Sahara will use “hdfs” user to access external HDFS by default. You can

still set up your own user account in Sahara as well.• Add external HDFS Location as a data source in Sahara

LimitationNo need for user account setup since currently it can only support non-secured HDFS

17Intel NetApp RedHat

Page 18: 20151027 sahara + manila final

NFS Share MountingBinary storage and input / output data from Manila-provisioned NFS shares

18Intel NetApp RedHat

Page 19: 20151027 sahara + manila final

The Feature• Mount Manila NFS shares to:

• All nodes in cluster

• Specific node groups (NN, etc.)

• Currently NFS-only

• Extensible to other share types

• API (see right)

• Path and access defaults shown

• Only id field needed

• Intended for non-EDP users

• EDP users can use auto-mount

shares: {[ { “id”: “uuid”, “path”: “/mnt/uuid”, “access_level”: “rw” }]}

19Intel NetApp RedHat

Page 20: 20151027 sahara + manila final

Use Case: Binary Data Storage• “Job binaries”: *.jar, *.pig, etc.

•Comparatively small size

• Initial location irrelevant to perf

• Previous storage options in Sahara•Swift (still available)

•Sahara DB (as blobs in SQL table)

• Rationales for NFS storage•Version control directly on storage FS

•Long-term storage for use by transient clusters

•HDFS clusters on separate networks can route to common repository

•RO access control from clusters useful in this case

20Intel NetApp RedHat

Page 21: 20151027 sahara + manila final

Gluster Node Gluster Node Gluster Node

Manila*Share

Compute2Compute1 Compute3

VM1 VM2 VM3 VM4

Tenant B

VM5 VM6

Any Drivers

Use Case: Input / Output DataPrevious options in Sahara

● Cluster-internal HDFS● External HDFS● Swift

Rationales for use● Standard FS access to data● Convenient in many cases

Data copy necessary ● Similar to built-in hadoop fs -put operation● Irrelevant in heavily reduced output or small

input case● In large input case, network transfer is a

consideration

Reference: https://blueprints.launchpad.net/sahara/+spec/manila-as-a-data-source

Tenant A

LocalLocal LocalLocalLocalLocal

Step1

Gluster-Volume Gluster-Volume Gluster-Volume

Use GlusterFS as an example

Step2

Step3

21Intel NetApp RedHat

Page 22: 20151027 sahara + manila final

Workflow: NFS Binary Storage and Input Data1. Create manila NFS share2. Place binary file on share at /absolute/path/to/binary.jar3. Create sahara job binary object with path reference manila://share_uuid/absolute/path/to/binary.jar

4. Utilize job binary in job template (per normal)5. Create sahara data source with path referencemanila://share_uuid/absolute/path/to/input_dir

6. Run job from template using data source

22Intel NetApp RedHat

Page 23: 20151027 sahara + manila final

Automatic Mounting• API field necessary to mount for non-EDP users

• Sahara’s EDP API mounts needed shares to a long-standing cluster when a job references any data source or binary on that share

• Uses defaults for permissions: rwand path: /mnt/share_uuid/

23Intel NetApp RedHat

Page 24: 20151027 sahara + manila final

Automatic Mounting: Under the Hood

Framework Job Binaries Data Sources

All (Universal flow, per cluster node)

Check to ensure required shares are mounted. If not:1) Install nfs-common (Debian*) or nfs-utils (Red Hat) if not present2) Get remote path for share UUID from Manila3) Manila: access-allow for each required ip in cluster (if access does not exist)4) mount -t nfs %(access_arg)s %(remote_path)s %(local_path)s

All (Universal flow) Translate manila://uuid/absolute/path to /local_path/absolute/path

Translate manila://uuid/absolute/path to file:///local_path/absolute/path

Hadoop (w/ Oozie) hadoop fs -copy-from-local into workflow directory; referenced as filesystem paths in workflow

Use file URL in Oozie workflow document (as named job parameter or positional argument)

Spark Referenced by local filesystem path in spark-submit call

Use file URL in spark-submit call (as positional argument)

Storm Referenced as filesystem paths in storm jar call

Use file URL in storm jar call (as positional argument)

24Intel NetApp RedHat

Page 25: 20151027 sahara + manila final

Screenshots

25Intel NetApp RedHat

Page 26: 20151027 sahara + manila final

26Intel NetApp RedHat

Page 27: 20151027 sahara + manila final

27Intel NetApp RedHat

Page 28: 20151027 sahara + manila final

28Intel NetApp RedHat

Page 29: 20151027 sahara + manila final

29Intel NetApp RedHat

Page 30: 20151027 sahara + manila final

NetApp Hadoop NFS ConnectorFuture Proposal: Use NetApp Hadoop NFS Connector in Sahara

30Intel NetApp RedHat

Page 31: 20151027 sahara + manila final

31

NetApp NFS Connector - Architecture Overview● NFS Client written in Java● Implements the Hadoop filesystem API● No changes to Hadoop framework● No changes to user programs● Eliminates copying data into HDFS● Optimized performance for NFS access

Intel NetApp RedHat

Page 32: 20151027 sahara + manila final

NFS Node NFS Node NFS Node

ManilaShare

Compute2Compute1 Compute3

VM1 VM2 VM3 VM4 VM5 VM6

NFS Driver

Sahara + Manila + NetApp NFS Connector

How to use1. Use Manila to expose the NFS share 2. NetApp Hadoop NFS Connector as

“interface” to shared data

The Advantages● NFS is one of the most common storage

protocols used in IT● A direct way to communicate and process data

instead of using HDFS

Reference: https://blueprints.launchpad.net/sahara/+spec/nfs-as-a-data-source

NetApp NFS

Driver

NetApp NFS

Driver

NetApp NFS

Driver

NetApp NFS

Driver

NetApp NFS

Driver

NetApp NFS

Driver

Step1

NFS Folder NFS Folder NFS Folder

Step2

Step3

32Intel NetApp RedHat

Tenant BTenant AUse Case● NFS protocol to access data for Hadoop

Page 33: 20151027 sahara + manila final

33

● Deployment Choices○ NFS(v3) ○ HDFS + NFS

● Open Source● Snapshot, Flexclone

Snapmirror, and Manila Disaster Recovery (Mitaka)

Intel NetApp RedHat

NetApp NFS Connector

Page 34: 20151027 sahara + manila final

NetApp Hadoop NFS Plugin Use NetApp NFS Connector to run Hadoop on your existing data

• $ hadoop jar <path-to-examples> jar terasort nfs://<nfs-server-hostname>:2049/tera/in /tera/out

• $ hadoop jar <path-to-examples> jar terasort nfs://<nfs-server-hostname>:2049/tera/in nfs://<nfs-server-hostname>:2049/tera/out

Reference:1. http://www.netapp.com/us/solutions/big-data/nfs-connector-hadoop.aspx2. https://github.com/NetApp/NetApp-Hadoop-NFS-Connector

34Intel NetApp RedHat

Page 35: 20151027 sahara + manila final

Summary

● The choices:a) Manila HDFS Driver

b) Manila NFS Share Mount

https://www.netapp.com/us/media/tr-4464.pdf

c) NetApp NFS Connector for Hadoop

https://github.com/NetApp/NetApp-Hadoop-NFS-Connector

35Intel NetApp RedHat

Sahara and Manila: Access the Big Data Oasis

Page 36: 20151027 sahara + manila final

36

http://netapp.github.io

For more information:

Page 37: 20151027 sahara + manila final

Participating in the Intel Passport Program?

37

Are you playing? Be sure to get your Passport Stamp for attending this session! See me or my helper in the back at the end!

Not Playing yet? What are you waiting for? See me or my helper in the back at the end and we can get you started!

Don’t forget to return your stamped passport to the Intel Booth #H3 to enter our raffle drawing! 3 Stamps = 1 Raffle Ticket

Intel NetApp RedHat

Page 38: 20151027 sahara + manila final

THANK YOU!

38Intel NetApp RedHat