La Residence Hotel & Spa, Hue - Ancient footsteps into imperial hue
A closer look at hue: how to interface with Hadoop
-
Upload
romain-rigaux -
Category
Technology
-
view
110 -
download
2
description
Transcript of A closer look at hue: how to interface with Hadoop
HueA closer Look at Hue
● Hue Architecture○ Many interfaces to implement○ How do I list HDFS files, how do I submit a job...?○ SDK
● Hue UI: Dynamic Workflow Editor○ Why improve the user experience?○ How can we improve the user experience?○ Design Considerations○ Design and Code Deep Dive
What's on the Menu
View from 30 000 feet
Ecosystem
Integrate with the Web
● HTTP, stateless (async queries)● Frontend / Backend (e.g. different servers,
pagination)● Resources (e.g. img, js, callbacks, css, json)● Browsers, multi techs● DB (sqlite, MySql, PostGres...)● i18n● ...
More on UI later
Integrate Users
● Auth○ Standard○ LDAP○ PAM○ Spnego○ Custom
(OAuth, Cookie...)
Integrate HDFS
● Interfaces○ Thrift (old)
■ NN○ REST
■ WebHdfs■ HttpFs (HA, new bugs)
● Uploads to HDFSclass HDFStemporaryUploadedFile(object):class HDFSfileUploadHandler(FileUploadHandler):
Integrate Hive
● Beeswax: embedded Hive CLI● Concurrent executions
● Beeswax / Hive Server 2 Thrift interfaces● Hue models, HQL, Impala, DDL
service TCLIService {
TExecuteStatementResp ExecuteStatement(1:TExecuteStatementReq req);
TGetOperationStatusResp GetOperationStatus(1:TGetOperationStatusReq req);....
service BeeswaxService { QueryHandle query(1:Query query) throws(1:BeeswaxException error),
QueryHandle executeAndWait(1:Query query, 2:LogContextId clientCtx) throws(1:BeeswaxException error),....
Integrate Hive
Moving to Pluggable interfaces
HS2Beeswax
DBMSSQL API
BTable HS2Table
Table
Integrate Impala
● New app● Same Beeswax/Hive Server 2 interfaces● One more moving target..
Integrate Jobs
● List, access, kill● aka JobBrowser
● JobTracker Thrift Pluginmapred-site.xml
<property> <name>jobtracker.thrift.address</name> <value>0.0.0.0:9290</value></property><property> <name>mapred.jobtracker.plugins</name> <value> org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin </value></property>
More Thrift
service Jobtracker extends common.HadoopServiceBase { ThriftJobInProgress getJob(10: common.RequestContext ctx, 1: ThriftJobID jobID) throws(1: JobNotFoundException err),
ThriftJobList getRunningJobs(10: common.RequestContext ctx),
Integrate Jobs
● Submit jobs (MR, Hive, Java, Pig...)● Manage workflows● Schedule workflows
● REST (GET, PUT, POST)
Integrate Shell
● Pig● HBase● Sqoop 2
● Spawning Server ● Greenlets● popen/pty/tty● IO (HTTP, DB...)● setuid● css/js/POST
Integrate YARN● JobBrowser MR2, Oozie
● No JT, 4 more REST API● MR to History Server, missing logs...● MR1/2 API not 100% compatible
(like Beeswax/HiveServer2, Beeswax UI/Impala switches)
Integrate security
● 'hue' superuserJT, Shell setuid root:hue
● 'hue' Proxy User / doAsHDFSOozie
<property> <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name> <value>*</value></property><property> <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name> <value>*</value></property>
● One 'hue' Kerberos ticket
● Hive Server 2 ?
SDK: Integrate Developers
● Set of raw libs
libs /hadoop /jobtracker /webhdfs /yarn /liboozie /rest /thrift
● Hue models
apps/ /jobbrowser /oozie /...
SDK: Integrate Developers
$ ./build/env/bin/hue create_desktop_app clouddemo
● Custom: views/model/templates● Reuse Hue libs
http://cloudera.github.com/hue/docs-2.1.0/sdk/sdk.html#fast-guide-to-creating-a-new-hue-application
CloudDemo example
Single click:
● HTTP● HDFS● Oozie● JT
After the Interfaces...
... now the dynamic UI (Oozie App use case)
● Users like things that are easy to use● Intuition and ease of use
Why Improve User Experience
● How can we do this for Oozie?○ Hue users are not engineers○ Most users are not familiar with shortcuts and
command lines○ Windowing systems have taught us drag and drop is
good
Drag and drop every thing in a Workflow!
How to Improve User Experience
Old Hue Windowing System
● Behavior○ Javascript○ Knockout JS○ JQuery
● Presentation○ CSS○ Bootstrap
● Content○ HTML (Templates)
● MV*○ MVC○ MVP○ MVVM
Fundamentals of Front End Design
● Existing backend from Hue 2.1○ Need to be able to easily migrate from Hue 2.1 to
Hue 2.2● Knockout JS and JQuery already chosen
○ Rudimentary templating○ Subscription based bindings○ Observables for arrays and Javascript literals only○ Event delegation
● Existing UI from Hue 2.1○ Provides basic node movement through form
submission (reloads the page)○ Not dynamic
Design Constraints
● Serializing should be trivial● Basic API
○ Save a workflow○ Validate a node○ Read a workflow
● Difference in representation between Hue 2.1 backend and the KnockoutJS way of doing things
● New nodes need an ID
Other Design Considerations
● Left out○ Many event bindings and custom events○ Views left out
Design - High Level Components
● Provides defaults for data:var NodeModel = ModelModule($);$.extend(NodeModel.prototype, { id: 0, name: '', description: '', node_type: '', workflow: 0, child_links: []});
● Sent over the wire● Mimics Django models
Purpose of the Node Model
● ModelViews should be the "shield" and Models the source of truth.
● Models are more serializable if they do not carry extraneous data.
● Subscribed update through KnockoutJS:$.each(mapping, function(key, value) {
var key = key;
if (ko.isObservable(self[key])) {
self[key].subscribe(function(value) {
model[key] = ko.mapping.toJS(value);
});
}
});
Model - ModelView Separation
● Construction optimization● Constant time node lookup● Looking towards the future and storage● Simple start:
var self = this;
self.nodes = {};
module.prototype.initialize.apply(self, arguments);
return self;
Purpose of the Registry
● Unique identifier for new nodes (IE: mapreduce:1).● Assists in creating parent-child relationships through
links.var IdGeneratorModule = function($) { return function(options) { var self = this; $.extend(self, options); self.counter = 1; self.nextId = function() { return ((self.prefix) ? self.prefix + ':' : '') + self.counter++; }; };};
Purpose of ID Generation
● KnockoutJS supports 3 kinds of observables○ Observables for literals○ Observable arrays○ Computed Observables
● DAG received is represented as a tree
● DAG represented as a list of lists when we display... MVVM restriction
Transpose to Show
● Decision node representation● JSON.stringify does not include parent class
members● Memory consumption● Cycles, cycles, cycles
Other Difficulties
Next steps
● Integrate○ Pig, Hive Server 2○ Oozie Bundles, SLA○ Document model, "Editors", git○ SDK revamp, language agnostic, proxy app
● UX○ Impala real time UI○ Redesign overall layout
● Sqoop 2, HBase? Mahout?...
Face of Hadoop/CDH