Open up interactive big data analysis for your enterprise

40
OPEN UP INTERACTIVE BIG DATA ANALYSIS FOR YOUR ENTERPRISE Enrico Berti Budapest DW Forum, Jun 4, 2014

description

Open up interactive big data analysis for your enterprise Hadoop brings many data crunching possibilities but also comes with a lot of complexity: the ecosystem is large and continuously changing, interactions happens on the command line, interfaces are built for engineers… This talk describes how Hue can be integrated with existing Hadoop deployments with minimal changes/disturbances. Enrico covers details on how Hue can leverage the existing authentication system and security model of your company. This talk describes through an interactive demo and dialog based on open source Hue how users can get started with Hadoop. We will detail how one can start or use an existing Hadoop cluster to setup Hue. The best practices about how to integrate your company directory and security will be shared. Moreover, the underlying technical details about interact with the ecosystem. The presentation will continue with real life analytics business use cases. It will show how data can be imported and loaded into the cluster for then being queried interactively with SQL or a search dashboard. All through your Web Browser! To sum-up, attendees of this talk will learn how Hadoop can be made more accessible and why Hue is the ideal gateway for quickly getting started or using the platform more efficiently.

Transcript of Open up interactive big data analysis for your enterprise

Page 1: Open up interactive big data analysis for your enterprise

OPEN UP INTERACTIVE BIG DATA ANALYSIS FOR YOUR ENTERPRISE

Enrico BertiBudapest DW Forum, Jun 4, 2014

Page 2: Open up interactive big data analysis for your enterprise

GOALOF HUE

WEB INTERFACE FOR ANALYZING DATA WITH APACHE HADOOP  !

SIMPLIFY AND INTEGRATEFREE AND OPEN SOURCE !

—> OPEN UP BIG DATA

Page 3: Open up interactive big data analysis for your enterprise

VIEW FROM30K FEET

Hadoop Web Server You, your colleagues and even that friend that uses IE9 ;)

Page 4: Open up interactive big data analysis for your enterprise

OPEN SOURCE

~3350 COMMITS  38 CONTRIBUTORS698 STARS245 FORKS !

github.com/cloudera/hue

Page 5: Open up interactive big data analysis for your enterprise

THE CORETEAM PLAYERS

Join  us  at  team.gethue.com

Romain  Rigaux Enrico  Ber9Chang Abraham  ElmahrekAmstel

Page 6: Open up interactive big data analysis for your enterprise

TALKS

Meetups  and  events  in  NYC,  Paris,  LA,  Tokyo,  SF,  Stockholm,  Vienna,  San  Jose,  Singapore,  Budapest… Coming  up  in  London,  West  coast

AROUNDTHE WORLD

RETREATS

Nov  13  Koh  Chang,  Thailand  May  14  Curaçao,  Netherlands  An9lles  Nov  14  Goa,  India

Page 8: Open up interactive big data analysis for your enterprise

HISTORY

HUE 1

Desktop-­‐like  in  a  browser,  did  its  job  but  preWy  slow,  memory  leaks  and  not  very  IE  friendly  but  definitely  advanced  for  its  9me  (2009-­‐2010).

Page 9: Open up interactive big data analysis for your enterprise

HISTORY

HUE 2

The  first  flat  structure  port,  with  TwiWer  Bootstrap  all  over  the  place.

HUE 2.5

New  apps,  improved  the  UX  adding  new  nice  func9onali9es  like  autocomplete  and  drag  &  drop.

Page 10: Open up interactive big data analysis for your enterprise

HISTORY

HUE 3 ALPHA

Proposed  design,  didn’t  make  it.

Page 11: Open up interactive big data analysis for your enterprise

HISTORY

HUE 3.5

New  UI,  several  new  apps,  the  most  user  friendly  features  to  date.

Page 12: Open up interactive big data analysis for your enterprise

HISTORY

HUE 3.6+

Where  we  are  now,  a  brand  new  way  to  search  and  explore  your  data.

Page 13: Open up interactive big data analysis for your enterprise

WHICH VERSION TO USE?

2500+  commits  later,  new  UI,  interac9ve  search  /  SQL,  dashboards...

1-­‐2  years  old,  use  only  if  you  depend  on  Hive  <  0.12  

HUE 2.X HUE 3.X

Page 14: Open up interactive big data analysis for your enterprise

WHICH DISTRIBUTION?

Advanced  preview The  most  stable  and  cross  component  checked

Very  latest

GITHUB CDH / CM TARBALL

HACKER ADVANCED USER NORMAL USER

Page 15: Open up interactive big data analysis for your enterprise

WHERE TO PUT HUE? IN ONE MACHINE

Page 16: Open up interactive big data analysis for your enterprise

WHERE TO PUT HUE? OUTSIDE THE CLUSTER

Page 17: Open up interactive big data analysis for your enterprise

WHERE TO PUT HUE? INSIDE THE CLUSTER

Page 18: Open up interactive big data analysis for your enterprise

Python  2.4  2.6That’s  it  if  using  a  packaged  version.  If  building  from  the  source,  here  are  the  extra  packages

SERVER CLIENT

Web  BrowserIE  9+,  FF  10+,  Chrome,  Safari

WHAT DO YOU NEED?

Hi  there,  I’m  “just”  a  web  server.

Page 19: Open up interactive big data analysis for your enterprise

HOW DOES THE HUE SERVICE LOOK LIKE?

Process  serving  pages  and  also  static  content

1 SERVER 1 DB

For  cookies,  saved  queries,  workflows,  …

Hi  there,  I’m  “just”  a  web  server.

Page 20: Open up interactive big data analysis for your enterprise

HOW TO CONFIGURE HUE

HUE.INI

Similar  to  core-­‐site.xml  but  with  .INI  syntax  !

Where?  

/etc/hue/conf/hue.ini

or  

$HUE_HOME/desktop/conf/

pseudo-distributed.ini

[desktop] [[database]] # Database engine is typically one of: # postgresql_psycopg2, mysql, or sqlite3 engine=sqlite3 ## host= ## port= ## user= ## password= name=desktop/desktop.db

Page 21: Open up interactive big data analysis for your enterprise

AUTHENTICATION

Login/Password  in  a  Database  (SQLite,  MySQL,  …)

SIMPLE ENTERPRISE

LDAP  (most  used),  OAuth,  OpenID,  SAML

Page 22: Open up interactive big data analysis for your enterprise

DB BACKEND

Page 23: Open up interactive big data analysis for your enterprise

LDAP BACKEND

Integrate  your  employees:  LDAP  How  to  guide

Page 24: Open up interactive big data analysis for your enterprise

USERS

Can  give  and  revoke  permissions  to  single  users  or  group  of  users

ADMIN USER

Regular  user  +  permissions

Page 25: Open up interactive big data analysis for your enterprise

LIST OF GROUPS AND PERMISSIONS

A  permission  can:  - allow  access  to  one  app  (e.g.  Hive  Editor)  

- modify  data  from  the  app  (e.g  drop  Hive  Tables  or  edit  cells  in  HBase  Browser)

CONFIGURE APPSAND PERMISSIONS

A  list  of  permissions

Page 26: Open up interactive big data analysis for your enterprise

PERMISSIONS IN ACTION

User  ‘test’  belonging  to  the  group  ‘hiveonly’  that  has  just  the  ‘hive’  permissions

CONFIGURE APPSAND PERMISSIONS

Page 27: Open up interactive big data analysis for your enterprise

HOW HUE INTERACTSWITH HADOOP

YARN

JobTracker

Oozie

Hue Plugins

LDAP SAML

Pig

HDFS HiveServer2

Hive Metastore

Cloudera Impala

Solr

HBase

Sqoop2

Zookeeper

Page 28: Open up interactive big data analysis for your enterprise

RCP CALLS TO ALLTHE HADOOP COMPONENTS

HDFS EXAMPLE

WebHDFS REST

DN

DN

DN

DN

NN

hWp://localhost:50070/webhdfs/v1/<PATH>?op=LISTSTATUS

Page 29: Open up interactive big data analysis for your enterprise

HOW

List  all  the  host/port  of  Hadoop  APIs  in  the  hue.ini  !

For  example  here  HBase  and  Hive.

RCP CALLS TO ALLTHE HADOOP COMPONENTS

Full  list

[hbase] # Comma-separated list of HBase Thrift servers for # clusters in the format of '(name|host:port)'. hbase_clusters=(Cluster|localhost:9090) ![beeswax] hive_server_host=host-abc hive_server_port=10000

Page 30: Open up interactive big data analysis for your enterprise

HTTPS SSL DB SSL WITH HIVESERVER2

READ MORE … AUDITING

SECURITYFEATURES

KERBEROS

Page 31: Open up interactive big data analysis for your enterprise

2  Hue  instances  

HA  proxy  

Mul9  DB  

Performances:  like  a  website,  mostly  RPC  calls

HIGH AVAILABILITY

HOW

Page 32: Open up interactive big data analysis for your enterprise

Impala,  Hive  integra9on,  Spark  (Shark  too)  

Interac9ve  SQL  editor    

Integra9on  with  MapReduce,  Metastore,  HDFS

SQL

WHAT

Page 33: Open up interactive big data analysis for your enterprise

Solr  &  Cloud  integra9on  

Custom  interac9ve  dashboards  

Drag  &  drop  widgets  (charts,  9meline…)

SEARCH

WHAT

Page 34: Open up interactive big data analysis for your enterprise

Simple  custom  query  language  Supports  HBase  filter  language  Supports  selec9on  &  Copy  +  Paste,  gracefully  degrades  in  IE  Autocomplete  Help  Menu  

Row$Key$

Scan$Length$

Prefix$Scan$

Column/Family$Filters$

Thri=$Filterstring$

Searchbar(Syntax(Breakdown(

HBASE BROWSER

WHAT

Page 35: Open up interactive big data analysis for your enterprise

DEMO TIME

Page 37: Open up interactive big data analysis for your enterprise

ROADMAPNEXT 6 MONTHS

Sentry  

Search,  Spark,  SQL  

More  dashboards!  

Oozie  v2  

Inter  component  integra9ons  (HBase  <-­‐>  Search,  create  index  wizards,  document  permissions),  Hadoop  Web  apps  SDK  

Your  idea  here.

WHAT

Page 38: Open up interactive big data analysis for your enterprise

CONFIGURATIONS ARE HARD…

…GIVE CLOUDERA MANAGER A TRY!

vimeo.com/91805055

Page 39: Open up interactive big data analysis for your enterprise

MISSEDSOMETHING?

learn.gethue.com

Page 40: Open up interactive big data analysis for your enterprise

TWITTER

@gethue

USER GROUP

hue-­‐user@

WEBSITE

hWp://gethue.com

LEARN

hWp://learn.gethue.com

THANK YOU!