Solace JMS Integration with Cloudera CDH...

12
Copyright © Solace Systems, Inc. http://www.solacesystems.com Solace JMS Integration with Cloudera CDH v5.4 Document Version 0.2 November 2015 This document is an integration guide for using Solace JMS as a JMS Broker to connect Cloudera CDH to the enterprise message bus. Cloudera CDH is a Hadoop solution that offers unified batch processing, interactive SQL, and interactive search. Along with Hadoops core elements of scalable storage and distributed computing, CDH delivers enterprise capabilities such as security through role-based access controls; and fully integrated solutions under a single user interface. The Solace message router supports persistent and non-persistent JMS messaging with high throughput and low, consistent latency. Thanks to very high capacity and built-in virtualization, each Solace message router can replace dozens of software-based JMS brokers in multi-tenant deployments. Since JMS is a standard API, client applications connect to Solace like any other JMS broker so companies whose applications are struggling with performance or reliability issues can easily overcome them by upgrading to Solace’s hardware.

Transcript of Solace JMS Integration with Cloudera CDH...

Copyright © Solace Systems, Inc.

http://www.solacesystems.com

Solace JMS Integration with Cloudera CDH v5.4

Document Version 0.2

November 2015

This document is an integration guide for using Solace JMS as a JMS Broker to connect Cloudera

CDH to the enterprise message bus.

Cloudera CDH is a Hadoop solution that offers unified batch processing, interactive SQL, and

interactive search. Along with Hadoops core elements of scalable storage and distributed

computing, CDH delivers enterprise capabilities such as security through role-based

access controls; and fully integrated solutions under a single user interface.

The Solace message router supports persistent and non-persistent JMS messaging with high

throughput and low, consistent latency. Thanks to very high capacity and built-in virtualization,

each Solace message router can replace dozens of software-based JMS brokers in multi-tenant

deployments. Since JMS is a standard API, client applications connect to Solace like any other

JMS broker so companies whose applications are struggling with performance or reliability issues

can easily overcome them by upgrading to Solace’s hardware.

Solace JMS Integration with Cloudera CDH 5.4

2

Table of Contents

Contents Solace JMS Integration with Cloudera CDH v5.4 ........................................................................ 1 Table of Contents ....................................................................................................................... 2 1 Overview .............................................................................................................................. 3

1.1 Related Documentation ................................................................................................................................... 4 2 Why Solace .......................................................................................................................... 5

Superior Performance ............................................................................................................................................. 5 Robustness ............................................................................................................................................................. 5 Simple Architecture ................................................................................................................................................. 5 Simple Operations .................................................................................................................................................. 5 Cost Savings .......................................................................................................................................................... 5

3 Integration with Apache Flume ............................................................................................. 6 4 Integrating with Spark Streaming .......................................................................................... 8 5 Working with Solace High Availability (HA) and Disaster Recovery (DR). ............................. 9 6 Manageability ..................................................................................................................... 10 7 Security .............................................................................................................................. 11

7.1 Authentication ................................................................................................................................................ 11 7.2 Authorization.................................................................................................................................................. 12 7.3 Encryption...................................................................................................................................................... 12

Solace JMS Integration with Cloudera CDH 5.4

3

1 Overview This document outlines the high level architecture of the Cloudera integration with Solace via Solace Java Message

Service (JMS). The goal of this document is to give the reader an understanding of the integration interfaces and

technologies as well as the benefits of these integration points.

The target audience of this document is architects using Hadoopv2 with knowledge of Flume, Spark and JMS in

general. As such this document focuses describing how the integration works and leaves the details of the mechanics

to the individual integration documents [Flume-REF] and [Spark-REF] for detailed steps on these integration points

please refer to the individual technology integration documents.

The high level advantage of this integration is that Solace exposes a wide range of application integration points

through its existing Enterprise System Bus and allows Cloudera CDH to connect with traditional batch type interfaces

as well as real time streaming interfaces, thus easing the effort to expose enterprise wide data to Big Data.

Solace Message Bus

Cloud

VMR VMR

VMR

Private and Public Cloud

Web and Mobile

Mainframe App Servers Data Store &Warehouses

Cloudera Stack

Internet of Things

Solace JMS Integration with Cloudera CDH 5.4

4

This document is divided into the following sections to cover the Solace JMS integration with Cloudera

o Integration with Apache Flume

o Integrating with Spark Streaming

o Working with Solace High Availability and Disaster Recovery

o Manageability

o Security

1.1 Related Documentation These documents contain information related to the feature defined in this document

Document ID Document Title Document Source

[Solace-Portal] Solace Developer Portal http://dev.solacesystems.com

[Solace-JMS-REF] Solace JMS Messaging API Developer

Guide

http://dev.solacesystems.com/docs/solace-jms-

api-developer-guide

[Solace-JMS-API] Solace JMS API Online Reference

Documentation

http://dev.solacesystems.com/docs/solace-jms-

api-online-reference

[Solace-FG] Solace Messaging Platform – Feature

Guide

http://dev.solacesystems.com/docs/messaging-

platform-feature-guide

[Solace-FP] Solace Messaging Platform – Feature

Provisioning

http://dev.solacesystems.com/docs/messaging-

platform-feature-provisioning

[Solace-CLI] Solace Appliance Command Line

Interface Reference

http://dev.solacesystems.com/docs/cli-reference

[Cloudera-REF] Cloudera Enterprise product information http://www.cloudera.com/content/cloudera/en/prod

ucts-and-services/cloudera-enterprise.html

[Flume-REF] Solace JMS Integration with Flume http://dev.solacesystems.com/integration-

guides/flume/

[Spark-REF] Solace JMS Integration with Spark http://dev.solacesystems.com/integration-

guides/spark-streaming/

Table 1 - Related Documents

Solace JMS Integration with Cloudera CDH 5.4

5

2 Why Solace Solace technology efficiently moves information between all kinds of applications, users and devices, anywhere in the

world, over all kinds of networks. Solace makes its state-of-the-art data movement capabilities available via hardware

and software “message routers” that can meet the needs of any application or deployment environment. Solace’s

unique solution offers unmatched capacity, performance, robustness and TCO so our customers can focus on seizing

business opportunities instead of building and maintaining complex data distribution infrastructure.

Superior Performance Solace’s hardware and software messaging middleware products can cost-effectively meet the performance needs of

any application, with feature parity and interoperability that lets companies start small and scale to support higher

volume or more demanding requirements over time, and purpose-built appliances that offer 50-100x higher

performance than any other technology for customers or applications that require extremely high capacity or low

latency.

Robustness Solace offers high availability (HA) and disaster recovery (DR) without the need for 3rd party products, and fast failover

times no other solution can match. Distributing data via dedicated TCP connections ensures an orderly, well-behaved

system under load, and patented techniques ensure that the performance of publishers and high-speed consumers is

never impacted by slow consumers.

Simple Architecture Modern enterprises run applications that demand many kinds of data movement such as persistent messaging, web

streaming, WAN distribution and cloud-based communications. By supporting all kinds of data movement with a unified

platform that can be deployed as a small-footprint software broker or high-capacity rack-mounted appliance, Solace lets

architects design an end-to-end infrastructure that’s easy to build applications for, integrate with existing technologies,

secure and scale.

Simple Operations Solace’s solution features a shared administration framework for all kinds of data movement, deployment models and

network environments so it’s easy for IT staff to deploy, monitor, manage and upgrade their Solace-based messaging

environment.

Cost Savings Solace reduces expenses with high-capacity hardware, flexible software, and the ability to deploy the right solution for

each problem. Solace’s support for many kinds of messaging lets you replace multiple messaging products with just

one, built-in HA, DR, WAN and Web functionality eliminate the need for third-party products.

Solace JMS Integration with Cloudera CDH 5.4

6

3 Integration with Apache Flume

Flume is a very flexible bridge application that runs in a JVM. It handles events in a data flow that passes data from a

source through a channel to an egress sink.

o Source – In this case the Source with be the default JMS source object that is distributed with Flume. This

object will create a transacted session that passes batches of messages through to the data flow.

o Channel – Is a memory channel that presents data to the egress sink

o Sink – Is the egress interface to export data into HDFS.

Besides providing a variety of application interfaces, the Solace message router provides an aggregation and queuing

point for Flume as well as provides the High Availability required for enterprise messaging.

Solace JMS Integration with Cloudera CDH 5.4

7

JMSSource

JMSSource

HDFSSink

HDFSSink

HDFS

App ServerApp Server

Redundant Message Routers

Flume Queue

Channel Channel

In the diagram above, there are redundant instance of Flume binding into a Single queue that is hosted on a pair of

Solace message routers. The redundant message routers co-ordinate to ensure that only one Flume instance is

receiving messages at a time. This can be done either active/standby for fault tolerance or round-robin delivery for load

balancing. In this messaging pattern a pair of Solace messaging router appliances can deliver 450K messages per

second for 500 byte messages and tolerate a failure of a single message router appliance without degradation in

performance.

The sessions connecting the Flume JMS source to the Solace message router can be authenticated via Active

Directory or Kerberos and the sessions can also be secured via TLS.

For a full description of steps to integrate and integration options please see [Flume-REF].

Solace JMS Integration with Cloudera CDH 5.4

8

4 Integrating with Spark Streaming Spark is the engine for streaming processing within the Cloudera stack. The Solace Message Router also integrates

with Spark via a JMS interface. This interface is built on the custom java receiver to provide flexible and very low

latency message flow from Solace into Spark.

App ServerApp Server

Redundant Message Routers

In the diagram above instances of Spark can be dynamically added to a distribution group, called a deliver to one

group, within the Solace message router allowing the message to be delivered once into a group. The group can be

elastically expanded and contracted to provide resources to handle the current workload. In this messaging pattern a

pair of Solace messaging router appliances can deliver 26 million messages per second for 100 byte messages and

tolerate a failure of a single message router appliance without degradation in performance. For message sizes bigger

than one hundred bytes, a single Solace message router appliance will saturate four ten Gigabit per second network

links.

The sessions connecting the Spark receiver to the Solace message router can be authenticated via Active Directory or

Kerberos and secured via TLS.

For a full description of steps to integrate and integration options please see [Spark-REF].

Solace JMS Integration with Cloudera CDH 5.4

9

5 Working with Solace High Availability (HA) and Disaster Recovery (DR).

As implied above, Solace supports message router redundancy which eliminates the potential for a single point of

failure in the Solace Messaging Platform by allowing a network administrator to define two Solace routers as a

redundant pair. If one of the routers is taken out of service or fails, the other router automatically takes over

responsibility for the clients typically served by the out-of-service router.

The redundancy feature is largely transparent to clients and other Solace routers in the network. Only the two routers

that are paired as mates require explicit configuration to take advantage of the feature.

Similarly, there is no configuration needed on a client host system to take advantage of the router redundancy facility.

The only visible impact to clients during a redundancy failover is non-delivery of messages for a short period of time,

and the clients are forced to reconnect.

Beyond router redundancy the Solace message routers also support full synchronous and asynchronous disaster

recovery solutions. This provides queue status and message replication to a disaster recovery data center. In case of

a data center failure, the DR site can be activated with the message state, (messages and acknowledgements), up to

date.

Disaster RecoverBridge

For a full description of redundancy and DR solutions, see [Solace-FG].

Solace JMS Integration with Cloudera CDH 5.4

10

6 Manageability

Solace supports an always-on monitoring model that includes and a very rich set of statistics from Layer 2 to Layer 5 of

the OSIO networking stack. Because of the Solace messaging router appliances separation from data plane and

control plane on different hardware, this monitoring can occur across the platform without having any effect on

messaging data rates or latency.

These statistics can be gathered or inspected at a high abstract level such as per appliance or in detail such asper

client or per queue.

With an element management protocol, (SEMP), Solace exposes these statistics and configuration objects in its

management platform, SolAdmin. As well Solace publishes the schema for 3rd

party integrations.

Solace message routers also have a wide range of configurable threshold events that can be received both by in-band

messaging or out of band via syslog push.

For a full description of monitoring solutions, see [Solace-FG].

Solace JMS Integration with Cloudera CDH 5.4

11

7 Security Solace messaging routers where purpose built as a messaging platform for Enterprise Messaging Bus architectures.

For this reason the systems where built with security in mind.

· Authentication via LDAP or Kerberos· TLSv1.2 Secure Session· Publish ACL control messages on bus· Connect ACL control publisher location

· Authentication via LDAP or Kerberos· TLSv1.2 Secure Session· Subscribe ACL control messages from bus· Connect ACL control publisher location

· Secure connection to AD

Active DirectoryApp Server

7.1 Authentication Each client connection is individually authenticated as it connects to the message router. Each connection and

disconnection, including client-username and location, is logged for auditability. Authentication can be done via:

o Internal Database – Internal encrypted database can be used to store username/password pairs

o Radius - The client username and password are sent to an external RADIUS server for authentication.

o LDAP,(Active Directory) - The client username and password are sent to an external LDAP server for

authentication as well as accessed JMS objects

Solace JMS Integration with Cloudera CDH 5.4

12

o Client Certificates - A client certificate authentication scheme allows a client to prove its identity to the Solace

router by providing a valid X509v3 client certificate from a recognized Certificate Authority (CA).

o Kerberos – A Kerberos authentication scheme allows clients that have been granted a valid Kerberos ticket to

connect to a Solace router.

7.2 Authorization Solace message routers supports Publish, Subscribe and Connect Access Control Lists, (ACLs). These ACLs can be

stored on the Solace message router and associated to the client connection via the client-username or the LDAP

group the connection belongs to.

Beyond ACLs Solace message routers can also control access to which clients can produce and consume persistent

messages from a queue.

7.3 Encryption Solace message routers support TLS encryption for:

o Client connections

o Message router to message router connection

o LDAP connections.

Solace message routers support the following cypher suites:

ECDHE-RSA-AES256-GCM-SHA384 ECDHE-RSA-AES256-SHA384 ECDHE-RSA-AES256-SHA AES256-GCM-SHA384 AES256-SHA256 AES256-SHA ECDHE-RSA-DES-CBC3-SHA DES-CBC3-SHA ECDHE-RSA-AES128-GCM-SHA256 ECDHE-RSA-AES128-SHA256 ECDHE-RSA-AES128-SHA AES128-GCM-SHA256 AES128-SHA256 AES128-SHA RC4-SHA

RC4-MD5

In the next release of the Solace message router, the client will be able to connect and authenticate via a secure

connection then drop down to a normal TCP connection to send and receive messages.

For a full description of client authentication, authorization and transport security see [Solace-FG].