A closer look at hue: how to interface with Hadoop

33
Hue A closer Look at Hue

description

Description about various ways to interface with Hadoop (Thrift, REST, JT plugins...) and how to build a Oozie workflow Drag & Drop editor.

Transcript of A closer look at hue: how to interface with Hadoop

Page 1: A closer look at hue: how to interface with Hadoop

HueA closer Look at Hue

Page 2: A closer look at hue: how to interface with Hadoop

● Hue Architecture○ Many interfaces to implement○ How do I list HDFS files, how do I submit a job...?○ SDK

● Hue UI: Dynamic Workflow Editor○ Why improve the user experience?○ How can we improve the user experience?○ Design Considerations○ Design and Code Deep Dive

What's on the Menu

Page 3: A closer look at hue: how to interface with Hadoop

View from 30 000 feet

Page 4: A closer look at hue: how to interface with Hadoop

Ecosystem

Page 5: A closer look at hue: how to interface with Hadoop

Integrate with the Web

● HTTP, stateless (async queries)● Frontend / Backend (e.g. different servers,

pagination)● Resources (e.g. img, js, callbacks, css, json)● Browsers, multi techs● DB (sqlite, MySql, PostGres...)● i18n● ...

More on UI later

Page 6: A closer look at hue: how to interface with Hadoop

Integrate Users

● Auth○ Standard○ LDAP○ PAM○ Spnego○ Custom

(OAuth, Cookie...)

Page 7: A closer look at hue: how to interface with Hadoop

Integrate HDFS

● Interfaces○ Thrift (old)

■ NN○ REST

■ WebHdfs■ HttpFs (HA, new bugs)

● Uploads to HDFSclass HDFStemporaryUploadedFile(object):class HDFSfileUploadHandler(FileUploadHandler):

Page 8: A closer look at hue: how to interface with Hadoop

Integrate Hive

● Beeswax: embedded Hive CLI● Concurrent executions

● Beeswax / Hive Server 2 Thrift interfaces● Hue models, HQL, Impala, DDL

service TCLIService {

TExecuteStatementResp ExecuteStatement(1:TExecuteStatementReq req);

TGetOperationStatusResp GetOperationStatus(1:TGetOperationStatusReq req);....

service BeeswaxService { QueryHandle query(1:Query query) throws(1:BeeswaxException error),

QueryHandle executeAndWait(1:Query query, 2:LogContextId clientCtx) throws(1:BeeswaxException error),....

Page 9: A closer look at hue: how to interface with Hadoop

Integrate Hive

Moving to Pluggable interfaces

HS2Beeswax

DBMSSQL API

BTable HS2Table

Table

Page 10: A closer look at hue: how to interface with Hadoop

Integrate Impala

● New app● Same Beeswax/Hive Server 2 interfaces● One more moving target..

Page 11: A closer look at hue: how to interface with Hadoop

Integrate Jobs

● List, access, kill● aka JobBrowser

● JobTracker Thrift Pluginmapred-site.xml

<property> <name>jobtracker.thrift.address</name> <value>0.0.0.0:9290</value></property><property> <name>mapred.jobtracker.plugins</name> <value> org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin </value></property>

More Thrift

service Jobtracker extends common.HadoopServiceBase { ThriftJobInProgress getJob(10: common.RequestContext ctx, 1: ThriftJobID jobID) throws(1: JobNotFoundException err),

ThriftJobList getRunningJobs(10: common.RequestContext ctx),

Page 12: A closer look at hue: how to interface with Hadoop

Integrate Jobs

● Submit jobs (MR, Hive, Java, Pig...)● Manage workflows● Schedule workflows

● REST (GET, PUT, POST)

Page 13: A closer look at hue: how to interface with Hadoop

Integrate Shell

● Pig● HBase● Sqoop 2

● Spawning Server ● Greenlets● popen/pty/tty● IO (HTTP, DB...)● setuid● css/js/POST

Page 14: A closer look at hue: how to interface with Hadoop

Integrate YARN● JobBrowser MR2, Oozie

● No JT, 4 more REST API● MR to History Server, missing logs...● MR1/2 API not 100% compatible

(like Beeswax/HiveServer2, Beeswax UI/Impala switches)

Page 15: A closer look at hue: how to interface with Hadoop

Integrate security

● 'hue' superuserJT, Shell setuid root:hue

● 'hue' Proxy User / doAsHDFSOozie

<property> <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name> <value>*</value></property><property> <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name> <value>*</value></property>

● One 'hue' Kerberos ticket

● Hive Server 2 ?

Page 16: A closer look at hue: how to interface with Hadoop

SDK: Integrate Developers

● Set of raw libs

libs /hadoop /jobtracker /webhdfs /yarn /liboozie /rest /thrift

● Hue models

apps/ /jobbrowser /oozie /...

Page 18: A closer look at hue: how to interface with Hadoop

CloudDemo example

Single click:

● HTTP● HDFS● Oozie● JT

Page 19: A closer look at hue: how to interface with Hadoop

After the Interfaces...

... now the dynamic UI (Oozie App use case)

Page 20: A closer look at hue: how to interface with Hadoop

● Users like things that are easy to use● Intuition and ease of use

Why Improve User Experience

Page 21: A closer look at hue: how to interface with Hadoop

● How can we do this for Oozie?○ Hue users are not engineers○ Most users are not familiar with shortcuts and

command lines○ Windowing systems have taught us drag and drop is

good

Drag and drop every thing in a Workflow!

How to Improve User Experience

Page 22: A closer look at hue: how to interface with Hadoop

Old Hue Windowing System

Page 23: A closer look at hue: how to interface with Hadoop

● Behavior○ Javascript○ Knockout JS○ JQuery

● Presentation○ CSS○ Bootstrap

● Content○ HTML (Templates)

● MV*○ MVC○ MVP○ MVVM

Fundamentals of Front End Design

Page 24: A closer look at hue: how to interface with Hadoop

● Existing backend from Hue 2.1○ Need to be able to easily migrate from Hue 2.1 to

Hue 2.2● Knockout JS and JQuery already chosen

○ Rudimentary templating○ Subscription based bindings○ Observables for arrays and Javascript literals only○ Event delegation

● Existing UI from Hue 2.1○ Provides basic node movement through form

submission (reloads the page)○ Not dynamic

Design Constraints

Page 25: A closer look at hue: how to interface with Hadoop

● Serializing should be trivial● Basic API

○ Save a workflow○ Validate a node○ Read a workflow

● Difference in representation between Hue 2.1 backend and the KnockoutJS way of doing things

● New nodes need an ID

Other Design Considerations

Page 26: A closer look at hue: how to interface with Hadoop

● Left out○ Many event bindings and custom events○ Views left out

Design - High Level Components

Page 27: A closer look at hue: how to interface with Hadoop

● Provides defaults for data:var NodeModel = ModelModule($);$.extend(NodeModel.prototype, { id: 0, name: '', description: '', node_type: '', workflow: 0, child_links: []});

● Sent over the wire● Mimics Django models

Purpose of the Node Model

Page 28: A closer look at hue: how to interface with Hadoop

● ModelViews should be the "shield" and Models the source of truth.

● Models are more serializable if they do not carry extraneous data.

● Subscribed update through KnockoutJS:$.each(mapping, function(key, value) {

var key = key;

if (ko.isObservable(self[key])) {

self[key].subscribe(function(value) {

model[key] = ko.mapping.toJS(value);

});

}

});

Model - ModelView Separation

Page 29: A closer look at hue: how to interface with Hadoop

● Construction optimization● Constant time node lookup● Looking towards the future and storage● Simple start:

var self = this;

self.nodes = {};

module.prototype.initialize.apply(self, arguments);

return self;

Purpose of the Registry

Page 30: A closer look at hue: how to interface with Hadoop

● Unique identifier for new nodes (IE: mapreduce:1).● Assists in creating parent-child relationships through

links.var IdGeneratorModule = function($) { return function(options) { var self = this; $.extend(self, options); self.counter = 1; self.nextId = function() { return ((self.prefix) ? self.prefix + ':' : '') + self.counter++; }; };};

Purpose of ID Generation

Page 31: A closer look at hue: how to interface with Hadoop

● KnockoutJS supports 3 kinds of observables○ Observables for literals○ Observable arrays○ Computed Observables

● DAG received is represented as a tree

● DAG represented as a list of lists when we display... MVVM restriction

Transpose to Show

Page 32: A closer look at hue: how to interface with Hadoop

● Decision node representation● JSON.stringify does not include parent class

members● Memory consumption● Cycles, cycles, cycles

Other Difficulties

Page 33: A closer look at hue: how to interface with Hadoop

Next steps

● Integrate○ Pig, Hive Server 2○ Oozie Bundles, SLA○ Document model, "Editors", git○ SDK revamp, language agnostic, proxy app

● UX○ Impala real time UI○ Redesign overall layout

● Sqoop 2, HBase? Mahout?...

Face of Hadoop/CDH