Hortonworks and Red Hat Webinar - Part 2

36
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Q&A box is available for your questions Webinar will be recorded for future viewing Thank you for joining! We’ll get started soon

description

Learn more about creating reference architectures that optimize the delivery the Hortonworks Data Platform. You will hear more about Hive, JBoss Data Virtualization Security, and you will also see in action how to combine sentiment data from Hadoop with data from traditional relational sources.

Transcript of Hortonworks and Red Hat Webinar - Part 2

Page 1: Hortonworks and Red Hat Webinar - Part 2

Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Q&A box is available for your questions

Webinar will be recorded for future viewing

Thank you for joining!

We’ll get started soon…

Page 2: Hortonworks and Red Hat Webinar - Part 2

Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Evolving your data into strategic asset …using HDP and Red Hat JBoss Data Virtualization

We do Hadoop.

Page 3: Hortonworks and Red Hat Webinar - Part 2

Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Your speakers…

Raghu Thiagarajan, Director, Partner Product Management, Hortonworks

Kimberly Palko, Principal Product Manager, Red Hat

Kenny Peeples, Principal Technical Marketing Manager, Red Hat

Page 4: Hortonworks and Red Hat Webinar - Part 2

Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Clickstream Capture and analyze website visitors’ data trails and optimize your website

Sensors Discover patterns in data streaming automatically from remote sensors and machines

Server Logs Research logs to diagnose process failures and prevent security breaches

New types of data Hadoop Value:

Sentiment Understand how your customers feel about your brand and products – right now

Geographic Analyze location-based data to manage operations where they occur

Unstructured Understand patterns in files across millions of web pages, emails, and documents

Page 5: Hortonworks and Red Hat Webinar - Part 2

Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDP 2.1: Enterprise Hadoop

HDP 2.1 Hortonworks Data Platform

   

Provision,  Manage  &  Monitor  

 Ambari  

Zookeeper  

Scheduling    

Oozie  

Data  Workflow,  Lifecycle  &  Governance  

 Falcon  Sqoop  Flume  NFS  

WebHDFS   YARN  :  Data  OperaFng  System  

DATA    MANAGEMENT  

SECURITY  DATA    ACCESS  GOVERNANCE  &  INTEGRATION  

AuthenFcaFon  AuthorizaFon  AccounFng  

Data  ProtecFon    

Storage:  HDFS  Resources:  YARN  Access:  Hive,  …    Pipeline:  Falcon  Cluster:  Knox  

OPERATIONS  

Script    Pig      

Search    

Solr      

SQL    

Hive/Tez,  HCatalog  

   

NoSQL    

HBase  Accumulo  

   

Stream      

Storm  

     

Others    

In-­‐Memory  AnalyNcs,    ISV  engines  

1   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°  

°  

N  

HDFS    (Hadoop  Distributed  File  System)  

Batch    

Map  Reduce      

Deployment  Choice  Linux Windows On-Premise Cloud

Page 6: Hortonworks and Red Hat Webinar - Part 2

Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

OPERATIONAL  TOOLS  

DEV  &  DATA  TOOLS  

INFRASTRUCTURE  

HDP is deeply integrated in the data center SO

UR

CES

EXISTING  Systems  

Clickstream   Web  &Social   GeolocaFon   Sensor  &  Machine  

Server  Logs   Unstructured  

DAT

A S

YSTE

M

RDBMS   EDW   MPP  HANA

APPLICAT

IONS  

BusinessObjects BI

HDP 2.1

Gov

erna

nce

&

Inte

grat

ion

Secu

rity

Ope

ratio

ns

Data Access

Data Management

YARN

•  Enables millions of JBoss developers to quickly build applications with Hadoop

•  Simplifies deployment of Hadoop on OpenStack

•  Develops and deploys Apache Hadoop as integrated components of the open modern data architecture

Page 7: Hortonworks and Red Hat Webinar - Part 2

Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Modern Data Architecture + Red Hat Data Virtualization Extract and Refine

•  Easily combine data from multiple sources without moving or copying data •  Use any reporting or analytical tool

Application Database Server

AMBARI

MAPREDUCE

YARN

HDFS

REST

DATA REFINEMENT

HIVE PIG CUSTOM

HTTP

STREAM

LOAD

SQOOP

FLUME

WebHDFS

NFS

SOURCE DATA

- Sensor Logs - Clickstream - Flat Files - Unstructured - Sentiment - Customer - Inventory

DBs

JMS Queue’s

Files Files Files

INTERACTIVE

HIVE Server2

Page 8: Hortonworks and Red Hat Webinar - Part 2

Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hive creates Structure on Raw Sentiment Data

Page 9: Hortonworks and Red Hat Webinar - Part 2

Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hive: Interactive SQL-IN-Hadoop

Stinger Initiative – DELIVERED Next generation SQL based interactive query in Hadoop Speed

Improve Hive query performance has increased by 100X to allow for interactive query times (seconds)

Scale The only SQL interface to Hadoop designed for queries that scale from TB to PB

SQL Support broadest range of SQL semantics for analytic applications running against Hadoop

Gov

erna

nce

&

Inte

grat

ion

Secu

rity

Ope

ratio

ns

Data Access

Data Management

HDP 2.1

An Open Community at its finest: Apache Hive Contribution

1,672 Jira Tickets Closed

145 Developers

44 Companies

~330,000 Lines Of Code Added… (2.5x)

Apache  YARN  

   

Apache  MapReduce    

1   °   °   °  

°   °   °   °  

°   °   °   °  

°  

°  

N  

HDFS    (Hadoop  Distributed  File  System)  

   

Apache    Tez    

Apache  Hive  SQL  

Business  AnalyFcs   Custom  Apps   SFnger  Project  

SFnger  Phase  1:  

•  Base  OpNmizaNons  •  SQL  Types  •  SQL  AnalyNc  FuncNons  •  ORCFile  Modern  File  Format  

SFnger  Phase  2:  •  SQL  Types  •  SQL  AnalyNc  FuncNons  •  Advanced  OpNmizaNons  •  Performance  Boosts  via  YARN  

Delivered

 

SFnger  Phase  3  •  Hive  on  Apache  Tez  •  Query  Service  (always  on)  •  Buffer  Cache  •  Cost  Based  OpNmizer  (OpNq)  

 

13 Months

Page 10: Hortonworks and Red Hat Webinar - Part 2

Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Federation: Regulatory requirements drive geo-specific clusters

Consume Compose Connect

DV  Dashboard  to  analyze  the  aggregated  data  by  User  Role  

JBoss Data Virtualization

Store

Web

Catalog

AMBARI

MAPREDUCE

YARN

HDFS REST

DATA REFINEMENT HIVE PIG CUSTOM

HTTP

STREAM

LOAD SQOOP FLUME

WebHDFS NFS

AMBARI

MAPREDUCE

YARN

HDFS REST

DATA REFINEMENT HIVE PIG CUSTOM

HTTP

STREAM

LOAD SQOOP FLUME

WebHDFS NFS

SOURCE  1:  Hive/Hadoop  in  the  HDP  contains  US  Region  Data  

SOURCE  2:  Hive/Hadoop  in  the  HDP  contains  EU  Region  Data  

INTERACTIVE

HIVE Server2

INTERACTIVE

HIVE Server2

Refine Explore

Page 11: Hortonworks and Red Hat Webinar - Part 2

Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Customer data with confidentiality requirements

Page 12: Hortonworks and Red Hat Webinar - Part 2

Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Red Hat JBoss Data Virtualization and Hortonworks HDP Kimberly Palko

Page 13: Hortonworks and Red Hat Webinar - Part 2

Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Engineering Collaboration Benefits Integration with JBoss Data Virtualization

Enable agile Big Data Hadoop integration with existing enterprise assets and maximize universal data utilization to enable self-service analytics

Integration with multiple Red Hat JBoss Middleware product family

Enables millions of JBoss developers to quickly build applications with Hadoop

Integration with Red Hat Storage Enables Hadoop to use Red Hat Storage secure resilient storage pool for data applications

Integration with Red Hat Enterprise Linux OpenStack Platform

Simplifies automated deployment of Hadoop on OpenStack

Integrated with Red Hat Enterprise Linux and OpenJDK

Develop and deploy Apache Hadoop as an integrated component for multiple deployment scenarios

Page 14: Hortonworks and Red Hat Webinar - Part 2

Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Data Supply and Integration Solution

Data Virtualization sits in front of multiple data sources and ü  allows them to be treated a single source ü  delivering the desired data

ü  in the required form

ü  at the right time

ü  to any application and/or user. THINK VIRTUAL MACHINE FOR DATA

Page 15: Hortonworks and Red Hat Webinar - Part 2

Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Easy Access to Big Data

•  Reporting tool accesses the data virtualization server via rich SQL dialect

•  The data virtualization server translates rich SQL dialect to HiveQL

•  Hive translates HiveQL to MapReduce

•  MapReduce runs MR job on big data

MapReduce

HDFS

Hive

Analytical Reporting

Tool

Data Virtualization

Server

Hadoop

Big Data

Page 16: Hortonworks and Red Hat Webinar - Part 2

Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Different Users Different Views of Big Data

•  Logical tables with different forms of aggregation

•  Logical tables containing extra derived data

•  Logical tables with filtered data •  All reports/users share the same

specifications

MapReduce

HDFS

Hive

Page 17: Hortonworks and Red Hat Webinar - Part 2

Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Caching For Faster Performance – Virtual View

Cached View 1

View 1

Query 2 Query 1

Virtual Database (VDB)

•  Same cached view for multiple queries

•  Refreshed automatically or manually •  Cache repository can be any

supported data source

Page 18: Hortonworks and Red Hat Webinar - Part 2

Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Caching for Faster Performance – Result Set

View 1

Query 1 Query 1

Virtual Database (VDB) Result Set Cache

•  Results for a single query are cached after first execution

•  Each unique query has its own cache

Page 19: Hortonworks and Red Hat Webinar - Part 2

Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Demonstration Combining sentiment data from Hadoop

with data from traditional relational sources

Kenny Peeples

Page 20: Hortonworks and Red Hat Webinar - Part 2

Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Use Case 1 - Overview

Objective: -Determine if sentiment data from the first week of the Iron Man 3 movie is a predictor of sales Problem: -Cannot utilize social data and sentiment analysis with sales management system Solution: -Leverage JBoss Data Virtualization to mashup Sentiment analysis data with ticket and merchandise sales data on MySQL into a single view of the data.

Consume Compose Connect

Excel  Powerview  and  DV  Dashboard  to  analyze  the  aggregated  data  

JBoss Data Virtualization

Hive

SOURCE  1:  Hive/Hadoop  contains  twi[er  data  including  

senFment  

SOURCE  2:  MySQL  data  that  includes  Fcket  and  merchandise  sales  

Page 21: Hortonworks and Red Hat Webinar - Part 2

Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Use Case 1 - Architecture

DATA

   SYSTEM  

TRADITIONAL  REPOSITORIES  

RDBMS   EDW   MPP  

APPLICAT

IONS  

Business    AnalyFcs  

Custom  ApplicaFons  

Packaged  ApplicaFons  

VIRTUAL  DATA  MART  

Page 22: Hortonworks and Red Hat Webinar - Part 2

Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Use Case 1 - Resources

•  GUIDE How to guide: https://github.com/DataVirtualizationByExample/HortonworksUseCase1 Tutorial: http://hortonworks.com/hadoop-tutorial/evolving-data-stratagic-asset-using-hdp-red-hat-jboss-data-virtualization/ •  VIDEOS: http://vimeo.com/user16928011/hortonworksusecase1short http://vimeo.com/user16928011/hortonworksusecase2short •  SOURCE: https://github.com/DataVirtualizationByExample/HortonworksUseCase1  

Page 23: Hortonworks and Red Hat Webinar - Part 2

Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

JBoss Data Virtualization Security Kimberly Palko

Page 24: Hortonworks and Red Hat Webinar - Part 2

Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Role based access control

Roles • Define roles based on

organization hierarchy

Users •  External authentication via

Kerberos, LDAP, etc.

VDB •  Assign users and groups to a

virtual data base

Page 25: Hortonworks and Red Hat Webinar - Part 2

Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Authentication

Kerberos From client to the virtual data base New in Data Virtualization 6.1: Kerberos authentication to the data source

Login Modules LDAP (MS Active Directory, OpenLDAP, etc.), any JAAS based security domain

REST and Web Services WS-UsernameToken HTTP Basic authentication

SAML SAML authentication for web client applications

Page 26: Hortonworks and Red Hat Webinar - Part 2

Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Audit Logging via Dashboard

Page 27: Hortonworks and Red Hat Webinar - Part 2

Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Row and Column Masking

-  Row based masking Ex: keyed off geographic marker - Column masking to a constant, null, or a SQL statement Example: change all but the Last 4 digits in a credit card number to stars concat('****', substring(column, length(column)-4))

Page 28: Hortonworks and Red Hat Webinar - Part 2

Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Summary of Security Capabilities •  Authentication

– Kerberos, LDAP, WS-UsernameToken, HTTP Basic, SAML •  Authorization

– Virtual data views, Role based access control •  Administration

– Centralized management of VDB privileges •  Audit

– Centralized audit logging and dashboard •  Protection

– Row and column masking – SSL encryption (ODBC and JDBC)

Page 29: Hortonworks and Red Hat Webinar - Part 2

Page 29 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Demonstration Geographically Distributed Hadoop Clusters with Data Virtualization -

Securing Data by User Role Kenny Peeples

Page 30: Hortonworks and Red Hat Webinar - Part 2

Page 30 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Use Case 2 - Overview

Objective: -Secure data according to Role for row level security and Column Masking

Problem: -Cannot hide region data from region specific users

Solution: -Leverage JBoss Data Virtualization to provide Row Level Security and Masking of columns

Page 31: Hortonworks and Red Hat Webinar - Part 2

Page 31 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Use Case 2 - Architecture

DATA

   SYSTEM  

APPLICAT

IONS  

Business    AnalyFcs  

Custom  ApplicaFons  

Packaged  ApplicaFons  

VIRTUAL  DATA  MART  

Page 32: Hortonworks and Red Hat Webinar - Part 2

Page 32 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Use Case 2 - Resources

•  GUIDE How to guide: https://github.com/DataVirtualizationByExample/HortonworksUseCase2 Tutorial: Available soon •  VIDEOS: http://vimeo.com/user16928011/hortonworksusecase2short http://vimeo.com/user16928011/hortonworksusecase2short •  SOURCE: https://github.com/DataVirtualizationByExample/HortonworksUseCase2  

Page 33: Hortonworks and Red Hat Webinar - Part 2

Page 33 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Benefits of JBoss Data Virtualization with Hortonworks HDP 2.1 •  Combines new data in Hadoop with data in

traditional data sources without moving or copying data

•  Gives access to a variety of BI and analytics tools

•  Provides caching for faster access to data •  Provides consistent security policy across

multiple data sources

Page 34: Hortonworks and Red Hat Webinar - Part 2

Page 34 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Thank you! Hortonworks and Red Hat JBoss Data Virtualization

Page 35: Hortonworks and Red Hat Webinar - Part 2

Page 35 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Next Steps...

Download the Hortonworks Sandbox

Learn Hadoop

Build Your Analytic App

Try Hadoop 2

More about Red Hat & Hortonworks http://hortonworks.com/partner/redhat

Contact us: [email protected]

Page 36: Hortonworks and Red Hat Webinar - Part 2

Page 36 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Don’t Forget to Register for our Next Webinar!

September 17th, 10 AM PST Red Hat JBoss Data Virtualization and Hortonworks Data Platform

http://info.hortonworks.com/RedHatSeries_Hortonworks.html