Download - MapR 5.2 Product Update

© 2016 MapR Technologies © 2016 MapR Technologies

5.2 Product Update

MapR Product Mgmt & Product Marketing

Aug 17, 2016

© 2016 MapR Technologies

Today’s Presenters

Sameer NoriSr. Product Marketing Manager

Prashant RathiSr. Product Manager

Ian DownardTechnical Marketing Engineer

Balaji MohanamProduct Manager

© 2016 MapR Technologies 3

Today’s Agenda • Recent Product Announcements

• The Spyglass Initiative & Demo

• MapR Ecosystem Pack(MEP)

• Spark and Drill updates


The MapR Converged Data Platform


Recent Product Announcements• Quick Start Solution focused on Risk Management for Financial

Services – July 16

• Enterprise-Grade Spark Distribution – June 16

• Quick Start Migration Service – May 16

• Stream Processing On-Demand Training(ODT) – Apr 16

• Apache Drill 1.6 – Mar 16


Four Big Themes in the 5.2 ReleaseMajor new features

• MapR-DB JSON Table replicationBinary Elastic Search v2.x supportDrill DB JSON improvements

• StreamsPerformant Spark Streaming Stream Admin APIs

Easier Management

• Spyglass : deep visibility across cluster opsDeep visibility

Search across metrics and logsFull control

customizable , sharable dashboardsExtensible

• Various Graphical Installer improvements

Community Innovation• MapR Eco Pack 1.0

Supportability and StabilityCurrency and Commitment to SLAEasy deployment and upgrade

Customer requested features

• POSIX : HardLink and StatFS feature• Fast Failover for client • Fuse Client performance• Rack Reliability for data placement

enhancement• File Client Impersonation enhancements


5.2 Ecosystem SupportThese are the only component version changes in MEP 1.0 from 5.2 release date and all of these have been out for 5.1 already.

Eco on 5.1 today MEP 1.0 on 5.2

Component Released with 5.1 Subsequently released for 5.1

Drill 1.4 1.6 1.6

Spark 1.5.2 1.6.1 1.6.1

Impala 2.2.0 2.5 2.5

Storm 0.10.0 0.10.1 0.10.1

Mahout 0.11.2 0.12.2 0.12.2


4 Reasons to Step Up to MapR 5.21. New features in the MapR Converged Data Platform

2. Ecosystem updates

3. Continuing quality improvements

4. End-of-maintenance for prior releases

© 2016 MapR Technologies 9© 2016 MapR Technologies

Project Spyglass


MapR Vision: Maximizing User/Operator Productivity

DeepVisibility

Another sample

EasyManagement

FullControl


The MapR Spyglass Initiative• New approach for increasing user and administrator productivity

– Comprehensive, open, extensible• Simplifies the management of growing big data deployments• Starts with upcoming release

– Phase 1 – MapR Monitoring– Initial focus on operational visibility

• Helps community innovate faster– Extensive use of open source visualization and dashboarding tools


Spyglass Initiative Phase 1 - MapR Monitoring

Empower administrators with cluster

monitoring capabilities, including

metric and log collection from nodes,

services, and jobs, with dashboards to

display information in a useful way.

Converged Customizable Extensible


Collection VisualizationAggregation & Storage

MapR Monitoring Architecture

Future

Data Sources

Log Shippers

Metrics Collectors

Alerting

Node Environmentals

(CPU, Mem, I/O)

Service Daemons

(YARN, Drill, Hive, etc.)

MapR Control System

…


Project Spyglass – Monitoring All You Care About

Node/Infrastructure Monitoring• Global Aggregates (Average, Min, Max)

Charts (e.g. CPU, Disk utilization)

• Per-node charts (e.g. I/O Throughput by disk)

• MFS read/writes and throughput

• DB puts, gets, scans and cache metrics

Cluster Space Utilization Monitoring• Cluster wide storage utilization

• Storage Utilization Trend

• Utilization per volume and per accountable entity (data, volume, snapshot and total size)

YARN/MR Application Monitoring• Global YARN trend graphs

• Containers - Pending, Active

• vCores & RAM - Allocated & Used

• Per Queue charts - containers, vCores, RAM

Service Daemon Monitoring• Per-service charts with for (CPU Usage

by type, Memory)

• Centralized, searchable logs

• MapR core and ecosystem services (includes YARN, Drill and Spark)















Service Daemon Monitoring• Per-service charts with for (CPU Usage by

type, Memory)

• Centralized, searchable logs

• MapR core and ecosystem services (includes YARN, Drill and Spark)


Customizable Dashboardsfor Visualizing Metrics

Log Analytics


Destination to Learn and Collaborate

Blog about topics and ideas

Share code snippets and dashboards

View demos, tutorials, and videos

Engage in use case discussion/development


Dashboards are defined with JSONand easy to export and import in Grafana and Kibana

Extend/Integrate using REST API

The Exchange


Dashboards can be viewed on mobile devices.


Summary

● Data collection and storage infrastructure (packaged and supported)

○ Collection/storage of metrics & logs across node, storage, services

● Visualization dashboard (Driven via community)○ Sample dashboards for Grafana & Kibana

5.2 - Spyglass 1.0 GA

CUSTOMIZABLE, shareable and mobile-ready dashboards

CONVERGED monitoring with deep search

EXTENSIBLE and easy to integrate with REST API


MapR Ecosystem Pack (MEP)


What is the MapR Ecosystem Pack (MEP)?

• What is the “MapR Ecosystem”?– A selected set of stable and popular components from the

Hadoop Ecosystem that we fully support on the MapR platform.

• What is the “Pack”?– A single repository of selected versions of these components fully tested

to be interoperable.– Available via installer or package.– Delivered with a predictable cadence.


Extended Ecosystem

Where Does MEP Fit In?

MapR Ecosystem

MEP

Community supported.

Fully supported, updates tied to MapR core.

Fully supported, updates follow MEP process.

Anoop Dawar

Maybe take kafka out and add something else -- like Apex here... we don't want to drive folks to Kafka and instead drive them to streams...


An Example: Drill in MEP releases

August September October November December January

MapR 5.2 MapR 6.0

MEP 1.0: Drill 1.6

An example of how this would look for Drill

MEP 1.1: Drill 1.8

MEP 3.0: Drill 2.X

MEP 2.0: Drill 1.9

On our current release plan, MapR 5.2 will receive 3 different versions of Drill before updates cease.


MEP Can Be Installed Using the 5.2 Installer

Can select MapR and MEP version. Can manually select components.


Competitor Process Comparison

MapR MEPProcess Cloudera Hortonworks

Predictable Cadence

Required Component Upgrades

Updates independent of core release

Developer Previews

Support For Multiple Versions

Packaged updates

How our new process stacks up against the competition:


Drill and Spark Updates


Drill Product Evolution

Drill 1.0 GA•Drill GA

Drill 1.1•Automatic Partitioning for Parquet Files

•Window Functions support

•- Aggregate Functions: AVG, COUNT, MAX, MIN, SUM

•-Ranking Functions: CUME_DIST, DENSE_RANK, PERCENT_RANK, RANK and ROW_NUMBER

•Hive impersonation

•SQL Union support

•Complex data enhancements· and more

Drill 1.2•Native parquet reader for Hive tables

•Hive partition pruning

•Multiple Hive versions support

•Hive 1.2.1 version support

•New analytical functions (Lead, lag, Ntiile etc)

•Multiple window Partition By clauses support

•Drop table syntax

•Metadata caching

•Security support for web UI

• INT 96 data type support

Drill 1.3/1.4• Improved Tableau experience with faster Limit 0 queries

•Metadata (INFORMATION_SCHEMA) query speed ups on Hive schemas/tables

•Robust partition pruning (more data types, large # of partitions)

•Optimized metadata cache

• Improved window functions resource usage and performance

Drill 1.5/1.6•Enhanced Stability & scale•New memory allocator

• Improved uniform query load distribution via connection pooling

• Enhanced query performance•Early application of partition pruning in query planning

•Hive tables query planning improvements

•Row count based pruning for Limit N queries

• JDK 1.8 support

Drill 1.7•Enhanced MaxDir/MinDir functions

•Access to Drill logs in the Web UI

•Addition of JDBC/ODBC client IP in Drill audit logs

•Monitoring via JMX

•Hive CHAR data type support

•Partition pruning enhancements

•Ability to return file names as part of queries

ANSI SQL Window

Functions

Enhanced Hive

Compatibility

Query Performance & Scale

Drill on MapR-DB

JSON tables

Easy Monitoring of deployments


Converging SQL and JSON with Apache Drill 1.6

• Flexible and operational analytics on NoSQL– MapR-DB plugin allows analysts to perform SQL queries directly on JSON data in MapR-DB tables– Pushdown capabilities provide optimal interactive experience

• Enhanced query performance – Provides better query performance via partition pruning, metadata caching and other optimizations– Delivers up to 10-60X performance gains in query planning compared to the previous releases of Drill

• Better memory management– Delivers greater stability and scale which enables customers to run not only larger but also more SQL

workloads on a MapR cluster

• Improved integration with visualization tools like Tableau– Introduces client impersonation for end-to-end security from the visualization tool to data in Hadoop. – Enhanced SQL Window functions


Drill ANSI SQL Capabilities Directly on JSON0: jdbc:drill:drillbit=10.10.103.32> SELECT * FROM mfs.yelp_maprdb.business LIMIT 1;+-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+-------+-------+------+| _id | attributes | business_id | categories | city | full_address | hours | latitude | longitude | name | neighborhoods | open | review_count | stars | state | type |+-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+-------+-------+------+| --1emggGHgoG6ipd_RMb-g | {"Accepts Credit Cards":true,"Parking":{"garage":false,"lot":true,"street":false,"valet":false,"validated":false},"Price Range":1.0,"Ambience":{},"Good For":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | --1emggGHgoG6ipd_RMb-g | ["Food","Convenience Stores"] | Las Vegas | 3280 S Decatur BlvdWestsideLas Vegas, NV 89102 | {"Friday":{},"Monday":{},"Saturday":{},"Sunday":{},"Thursday":{},"Tuesday":{},"Wednesday":{}} | 36.1305306 | -115.2072382 | Sinclair | ["Westside"] | true | 4.0 | 4.0 | NV | business |+-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+-------+-------+------+

0: jdbc:drill:drillbit=10.10.103.32> SELECT count(*) FROM mfs.yelp_maprdb.business;+---------+| EXPR$0 |+---------+| 42153 |+---------+


Simplified Deployment with YARN (Drill 1.8)

● Drill as a long running application in YARN

● Key features○ Client tool to launch Drill as

YARN application○ New Drill application

master (AM)○ CPU & memory controls○ Add/remove nodes to

cluster○ Multiple Drill clusters

Drill Configuration w/YARN


Spark 2.0


What’s in Spark 2.0?• Structured Streaming with Spark SQL

– The ability to perform interactive queries against live streaming data.– Output can now be aggregated in a stream for continuous applications.– Pre-computation of analytics in a continuous fashion can occur as the data is generated

• Whole Stage Code-gen– Provided by the second-generation Tungsten engine.– Eliminates the need for multiple JVM calls by flattening SQL queries into one single

function evaluated as bytecode at runtime.

• Dataset API’s– Runs on the same engine as SparkSQL.– Allows access to data from a variety of different data sources.– Can run database-like operations or allow for passing in custom code.


Spark 2.0: Structure Streaming with Spark SQL (Alpha)

valrecords=sqlContext.read.format(“json”).stream(“hdfs://input”) valcounts=records.groupBy(“user”).count() counts.write .trigger(ProcessingTime(“5sec”)) .outputMode(UpdateInPlace(“user”)) .format(“jdbc”) .startStream(“mysql://...”)

Repeated Queries

DB

User Count

User 1 10

User 2 23

User 3 16

…….. ……..

Store only the processed output instead of every single record.

● Query executed repeatedly as and when the data arrives.● Read the result from persistent storage, instead of processing the entire data set, resulting in faster access.


Spark 2.0 Whole Stage Code-gen: Planner

ParquetRelation

Filter

Project

Broadcast Hash join

Project

TungstenAggregate

Exchange

ParquetRelation

Filter

Project

ParquetRelation

Filter

Project

Broadcast Hash join

Project

TungstenAggregate

Exchange

ParquetRelation

Filter

Project

Whole Stage Codegen Whole Stage Codegen


Q & AEngage with us!

1. Spyglass Initiativehttps://www.mapr.com/products/spyglass-initiative

https://community.mapr.com/docs/DOC-1088

2. Ask Questions: – Ask Us Anything about Spyglass in the MapR Community from Mon(Aug 29nd)-

Fri(Sep 2nd)

– https://community.mapr.com/

https://www.mapr.com/products/spyglass-initiative






https://community.mapr.com/