© 2016 MapR Technologies © 2016 MapR Technologies
5.2 Product Update
MapR Product Mgmt & Product Marketing
Aug 17, 2016
© 2016 MapR Technologies
Today’s Presenters
Sameer NoriSr. Product Marketing Manager
Prashant RathiSr. Product Manager
Ian DownardTechnical Marketing Engineer
Balaji MohanamProduct Manager
© 2016 MapR Technologies 3
Today’s Agenda • Recent Product Announcements
• The Spyglass Initiative & Demo
• MapR Ecosystem Pack(MEP)
• Spark and Drill updates
© 2016 MapR Technologies 4
The MapR Converged Data Platform
© 2016 MapR Technologies 5
Recent Product Announcements• Quick Start Solution focused on Risk Management for Financial
Services – July 16
• Enterprise-Grade Spark Distribution – June 16
• Quick Start Migration Service – May 16
• Stream Processing On-Demand Training(ODT) – Apr 16
• Apache Drill 1.6 – Mar 16
© 2016 MapR Technologies 6
Four Big Themes in the 5.2 ReleaseMajor new features
• MapR-DB JSON Table replicationBinary Elastic Search v2.x supportDrill DB JSON improvements
• StreamsPerformant Spark Streaming Stream Admin APIs
Easier Management
• Spyglass : deep visibility across cluster opsDeep visibility
Search across metrics and logsFull control
customizable , sharable dashboardsExtensible
• Various Graphical Installer improvements
Community Innovation• MapR Eco Pack 1.0
Supportability and StabilityCurrency and Commitment to SLAEasy deployment and upgrade
Customer requested features
• POSIX : HardLink and StatFS feature• Fast Failover for client • Fuse Client performance• Rack Reliability for data placement
enhancement• File Client Impersonation enhancements
© 2016 MapR Technologies 7
5.2 Ecosystem SupportThese are the only component version changes in MEP 1.0 from 5.2 release date and all of these have been out for 5.1 already.
Eco on 5.1 today MEP 1.0 on 5.2
Component Released with 5.1 Subsequently released for 5.1
Drill 1.4 1.6 1.6
Spark 1.5.2 1.6.1 1.6.1
Impala 2.2.0 2.5 2.5
Storm 0.10.0 0.10.1 0.10.1
Mahout 0.11.2 0.12.2 0.12.2
© 2016 MapR Technologies 8
4 Reasons to Step Up to MapR 5.21. New features in the MapR Converged Data Platform
2. Ecosystem updates
3. Continuing quality improvements
4. End-of-maintenance for prior releases
© 2016 MapR Technologies 9© 2016 MapR Technologies
Project Spyglass
© 2016 MapR Technologies 10
MapR Vision: Maximizing User/Operator Productivity
DeepVisibility
Another sample
EasyManagement
FullControl
© 2016 MapR Technologies 11
The MapR Spyglass Initiative• New approach for increasing user and administrator productivity
– Comprehensive, open, extensible• Simplifies the management of growing big data deployments• Starts with upcoming release
– Phase 1 – MapR Monitoring– Initial focus on operational visibility
• Helps community innovate faster– Extensive use of open source visualization and dashboarding tools
© 2016 MapR Technologies 12
Spyglass Initiative Phase 1 - MapR Monitoring
Empower administrators with cluster
monitoring capabilities, including
metric and log collection from nodes,
services, and jobs, with dashboards to
display information in a useful way.
Converged Customizable Extensible
© 2016 MapR Technologies 13
Collection VisualizationAggregation & Storage
MapR Monitoring Architecture
Future
Data Sources
Log Shippers
Metrics Collectors
Alerting
Node Environmentals
(CPU, Mem, I/O)
Service Daemons
(YARN, Drill, Hive, etc.)
MapR Control System
…
© 2016 MapR Technologies 14
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
Cluster Space Utilization Monitoring• Cluster wide storage utilization
• Storage Utilization Trend
• Utilization per volume and per accountable entity (data, volume, snapshot and total size)
YARN/MR Application Monitoring• Global YARN trend graphs
• Containers - Pending, Active
• vCores & RAM - Allocated & Used
• Per Queue charts - containers, vCores, RAM
Service Daemon Monitoring• Per-service charts with for (CPU Usage
by type, Memory)
• Centralized, searchable logs
• MapR core and ecosystem services (includes YARN, Drill and Spark)
© 2014 MapR Technologies 15
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
© 2014 MapR Technologies 16
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
Cluster Space Utilization Monitoring• Cluster wide storage utilization
• Storage Utilization Trend
• Utilization per volume and per accountable entity (data, volume, snapshot and total size)
© 2014 MapR Technologies 17
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
Cluster Space Utilization Monitoring• Cluster wide storage utilization
• Storage Utilization Trend
• Utilization per volume and per accountable entity (data, volume, snapshot and total size)
YARN/MR Application Monitoring• Global YARN trend graphs
• Containers - Pending, Active
• vCores & RAM - Allocated & Used
• Per Queue charts - containers, vCores, RAM
© 2014 MapR Technologies 18
Project Spyglass – Monitoring All You Care About
Node/Infrastructure Monitoring• Global Aggregates (Average, Min, Max)
Charts (e.g. CPU, Disk utilization)
• Per-node charts (e.g. I/O Throughput by disk)
• MFS read/writes and throughput
• DB puts, gets, scans and cache metrics
Cluster Space Utilization Monitoring• Cluster wide storage utilization
• Storage Utilization Trend
• Utilization per volume and per accountable entity (data, volume, snapshot and total size)
YARN/MR Application Monitoring• Global YARN trend graphs
• Containers - Pending, Active
• vCores & RAM - Allocated & Used
• Per Queue charts - containers, vCores, RAM
Service Daemon Monitoring• Per-service charts with for (CPU Usage by
type, Memory)
• Centralized, searchable logs
• MapR core and ecosystem services (includes YARN, Drill and Spark)
© 2016 MapR Technologies 19
Customizable Dashboardsfor Visualizing Metrics
Log Analytics
© 2016 MapR Technologies 20
Destination to Learn and Collaborate
Blog about topics and ideas
Share code snippets and dashboards
View demos, tutorials, and videos
Engage in use case discussion/development
© 2016 MapR Technologies 21
Dashboards are defined with JSONand easy to export and import in Grafana and Kibana
Extend/Integrate using REST API
The Exchange
© 2016 MapR Technologies 22
Dashboards can be viewed on mobile devices.
© 2016 MapR Technologies 23
Summary
● Data collection and storage infrastructure (packaged and supported)
○ Collection/storage of metrics & logs across node, storage, services
● Visualization dashboard (Driven via community)○ Sample dashboards for Grafana & Kibana
5.2 - Spyglass 1.0 GA
CUSTOMIZABLE, shareable and mobile-ready dashboards
CONVERGED monitoring with deep search
EXTENSIBLE and easy to integrate with REST API
© 2016 MapR Technologies 24© 2016 MapR Technologies
MapR Ecosystem Pack (MEP)
© 2016 MapR Technologies 25
What is the MapR Ecosystem Pack (MEP)?
• What is the “MapR Ecosystem”?– A selected set of stable and popular components from the
Hadoop Ecosystem that we fully support on the MapR platform.
• What is the “Pack”?– A single repository of selected versions of these components fully tested
to be interoperable.– Available via installer or package.– Delivered with a predictable cadence.
© 2016 MapR Technologies 26
Extended Ecosystem
Where Does MEP Fit In?
MapR Ecosystem
MEP
Community supported.
Fully supported, updates tied to MapR core.
Fully supported, updates follow MEP process.
© 2016 MapR Technologies 27
An Example: Drill in MEP releases
August September October November December January
MapR 5.2 MapR 6.0
MEP 1.0: Drill 1.6
An example of how this would look for Drill
MEP 1.1: Drill 1.8
MEP 3.0: Drill 2.X
MEP 2.0: Drill 1.9
On our current release plan, MapR 5.2 will receive 3 different versions of Drill before updates cease.
© 2016 MapR Technologies 28
MEP Can Be Installed Using the 5.2 Installer
Can select MapR and MEP version. Can manually select components.
© 2016 MapR Technologies 29
Competitor Process Comparison
MapR MEPProcess Cloudera Hortonworks
Predictable Cadence
Required Component Upgrades
Updates independent of core release
Developer Previews
Support For Multiple Versions
Packaged updates
How our new process stacks up against the competition:
© 2016 MapR Technologies 30© 2016 MapR Technologies
Drill and Spark Updates
© 2016 MapR Technologies 31
Drill Product Evolution
Drill 1.0 GA•Drill GA
Drill 1.1•Automatic Partitioning for Parquet Files
•Window Functions support
•- Aggregate Functions: AVG, COUNT, MAX, MIN, SUM
•-Ranking Functions: CUME_DIST, DENSE_RANK, PERCENT_RANK, RANK and ROW_NUMBER
•Hive impersonation
•SQL Union support
•Complex data enhancements· and more
Drill 1.2•Native parquet reader for Hive tables
•Hive partition pruning
•Multiple Hive versions support
•Hive 1.2.1 version support
•New analytical functions (Lead, lag, Ntiile etc)
•Multiple window Partition By clauses support
•Drop table syntax
•Metadata caching
•Security support for web UI
• INT 96 data type support
Drill 1.3/1.4• Improved Tableau experience with faster Limit 0 queries
•Metadata (INFORMATION_SCHEMA) query speed ups on Hive schemas/tables
•Robust partition pruning (more data types, large # of partitions)
•Optimized metadata cache
• Improved window functions resource usage and performance
Drill 1.5/1.6•Enhanced Stability & scale•New memory allocator
• Improved uniform query load distribution via connection pooling
• Enhanced query performance•Early application of partition pruning in query planning
•Hive tables query planning improvements
•Row count based pruning for Limit N queries
• JDK 1.8 support
Drill 1.7•Enhanced MaxDir/MinDir functions
•Access to Drill logs in the Web UI
•Addition of JDBC/ODBC client IP in Drill audit logs
•Monitoring via JMX
•Hive CHAR data type support
•Partition pruning enhancements
•Ability to return file names as part of queries
ANSI SQL Window
Functions
Enhanced Hive
Compatibility
Query Performance & Scale
Drill on MapR-DB
JSON tables
Easy Monitoring of deployments
© 2016 MapR Technologies 32
Converging SQL and JSON with Apache Drill 1.6
• Flexible and operational analytics on NoSQL– MapR-DB plugin allows analysts to perform SQL queries directly on JSON data in MapR-DB tables– Pushdown capabilities provide optimal interactive experience
• Enhanced query performance – Provides better query performance via partition pruning, metadata caching and other optimizations– Delivers up to 10-60X performance gains in query planning compared to the previous releases of Drill
• Better memory management– Delivers greater stability and scale which enables customers to run not only larger but also more SQL
workloads on a MapR cluster
• Improved integration with visualization tools like Tableau– Introduces client impersonation for end-to-end security from the visualization tool to data in Hadoop. – Enhanced SQL Window functions
© 2016 MapR Technologies 33
Drill ANSI SQL Capabilities Directly on JSON0: jdbc:drill:drillbit=10.10.103.32> SELECT * FROM mfs.yelp_maprdb.business LIMIT 1;+-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+-------+-------+------+| _id | attributes | business_id | categories | city | full_address | hours | latitude | longitude | name | neighborhoods | open | review_count | stars | state | type |+-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+-------+-------+------+| --1emggGHgoG6ipd_RMb-g | {"Accepts Credit Cards":true,"Parking":{"garage":false,"lot":true,"street":false,"valet":false,"validated":false},"Price Range":1.0,"Ambience":{},"Good For":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | --1emggGHgoG6ipd_RMb-g | ["Food","Convenience Stores"] | Las Vegas | 3280 S Decatur BlvdWestsideLas Vegas, NV 89102 | {"Friday":{},"Monday":{},"Saturday":{},"Sunday":{},"Thursday":{},"Tuesday":{},"Wednesday":{}} | 36.1305306 | -115.2072382 | Sinclair | ["Westside"] | true | 4.0 | 4.0 | NV | business |+-----+------------+-------------+------------+------+--------------+-------+----------+-----------+------+---------------+------+--------------+-------+-------+------+
0: jdbc:drill:drillbit=10.10.103.32> SELECT count(*) FROM mfs.yelp_maprdb.business;+---------+| EXPR$0 |+---------+| 42153 |+---------+
© 2016 MapR Technologies 34
Simplified Deployment with YARN (Drill 1.8)
● Drill as a long running application in YARN
● Key features○ Client tool to launch Drill as
YARN application○ New Drill application
master (AM)○ CPU & memory controls○ Add/remove nodes to
cluster○ Multiple Drill clusters
Drill Configuration w/YARN
© 2016 MapR Technologies 35
Spark 2.0
© 2016 MapR Technologies 36
What’s in Spark 2.0?• Structured Streaming with Spark SQL
– The ability to perform interactive queries against live streaming data.– Output can now be aggregated in a stream for continuous applications.– Pre-computation of analytics in a continuous fashion can occur as the data is generated
• Whole Stage Code-gen– Provided by the second-generation Tungsten engine.– Eliminates the need for multiple JVM calls by flattening SQL queries into one single
function evaluated as bytecode at runtime.
• Dataset API’s– Runs on the same engine as SparkSQL.– Allows access to data from a variety of different data sources.– Can run database-like operations or allow for passing in custom code.
© 2016 MapR Technologies 37
Spark 2.0: Structure Streaming with Spark SQL (Alpha)
valrecords=sqlContext.read.format(“json”).stream(“hdfs://input”) valcounts=records.groupBy(“user”).count() counts.write .trigger(ProcessingTime(“5sec”)) .outputMode(UpdateInPlace(“user”)) .format(“jdbc”) .startStream(“mysql://...”)
Repeated Queries
DB
User Count
User 1 10
User 2 23
User 3 16
…….. ……..
Store only the processed output instead of every single record.
● Query executed repeatedly as and when the data arrives.● Read the result from persistent storage, instead of processing the entire data set, resulting in faster access.
© 2016 MapR Technologies 38
Spark 2.0 Whole Stage Code-gen: Planner
ParquetRelation
Filter
Project
Broadcast Hash join
Project
TungstenAggregate
Exchange
ParquetRelation
Filter
Project
ParquetRelation
Filter
Project
Broadcast Hash join
Project
TungstenAggregate
Exchange
ParquetRelation
Filter
Project
Whole Stage Codegen Whole Stage Codegen
© 2016 MapR Technologies 39
Q & AEngage with us!
1. Spyglass Initiativehttps://www.mapr.com/products/spyglass-initiative
https://community.mapr.com/docs/DOC-1088
2. Ask Questions: – Ask Us Anything about Spyglass in the MapR Community from Mon(Aug 29nd)-
Fri(Sep 2nd)
– https://community.mapr.com/
Top Related