Taming Latency: Case Studies in MapReduce Data Analytics

44
1 © Copyright 2013 EMC Corporation. All rights reserved. Taming latency: case studies in MapReduce data analytics Simon Tao EMC Labs China Office of the CTO

description

This session discusses how to achieve low latency in MapReduce data analysis, with various industrial and academic case studies. These illustrate various improvements on MapReduce for squeezing out latency from whole data processing stack, covering batch-mode MapReduce system, as well as stream processing systems. This session also introduces our BoltMR project efforts on this topic and discloses some interesting benchmark results. Objective 1: Understand why low-latency matters for many MapReduce-based big data analytics scenarios. After this session you will be able to: Objective 2: Learn the root causes of MapReduce latency, the obstacles to lowering the latency and the various (im)mature solutions. Objective 3: Understand the extent of MapReduce low-latency that is needed for their own applications and which optimization techniques are potentially applicable.

Transcript of Taming Latency: Case Studies in MapReduce Data Analytics

Page 1: Taming Latency: Case Studies in MapReduce Data Analytics

1 © Copyright 2013 EMC Corporation. All rights reserved.

Taming latency: case studies in MapReduce data analytics Simon Tao EMC Labs China Office of the CTO

Page 2: Taming Latency: Case Studies in MapReduce Data Analytics

2 © Copyright 2013 EMC Corporation. All rights reserved.

Roadmap Information Disclaimer EMC makes no representation and undertakes no obligations with

regard to product planning information, anticipated product characteristics, performance specifications, or anticipated release dates (collectively, “Roadmap Information”).

Roadmap Information is provided by EMC as an accommodation to the recipient solely for purposes of discussion and without intending to be bound thereby.

Roadmap information is EMC Restricted Confidential and is provided under the terms, conditions and restrictions defined in the EMC Non-Disclosure Agreement in place with your organization.

Page 3: Taming Latency: Case Studies in MapReduce Data Analytics

3 © Copyright 2013 EMC Corporation. All rights reserved.

Agenda

Motivation

Combating latency in general

Latency reducing approach for MR

Case studies for MR with focus on low latency

Summary

Page 4: Taming Latency: Case Studies in MapReduce Data Analytics

4 © Copyright 2013 EMC Corporation. All rights reserved.

Introduction

What this presentation is about – Approaches that improve performance by enhancing

existing MapReduce platform – Focus on per-job latency in wall-clock time, among other

performance metrics – With case studies from both academia and industry

What it is not about – Performance improvement by manipulating MapReduce

framework tuning knobs

Page 5: Taming Latency: Case Studies in MapReduce Data Analytics

5 © Copyright 2013 EMC Corporation. All rights reserved.

Low latency: Motivations

Faster decision-making – Fraud detection, system monitoring, trending topic

identification

Interactivity – Targeted advertising, personalized news feeds, Online

recommendations

“pay-as-you-go” services – Economic advantage in “pay-as-you-go” billing model

Page 6: Taming Latency: Case Studies in MapReduce Data Analytics

6 © Copyright 2013 EMC Corporation. All rights reserved.

Sources of latency

Latency is everywhere – Hardware Infrastructure: Processors,

Memory, Storage I/O, Network I/O – Software Infrastructure: OS Kernel,

JVM, Server software – Architectural design and system

implementation – Communication protocol: DNS, TCP

Computer science is a thousand layers of abstraction

— One colleague of Shimon Schocken

I see latency.... They're everywhere.

Page 7: Taming Latency: Case Studies in MapReduce Data Analytics

7 © Copyright 2013 EMC Corporation. All rights reserved.

Combating latency by extrapolation

Approach to minimize latency for systems in general – Address every latency bottleneck in system – Minimize it's latency contribution

Apply latency minimizing approach to MapReduce – What are the layers in MapReduce data processing stack? – How can the latency contributions from them be mitigated?

Page 8: Taming Latency: Case Studies in MapReduce Data Analytics

8 © Copyright 2013 EMC Corporation. All rights reserved.

MapReduce Recap: logical view

Map(k1,v1) → list(k2,v2) – User defined Map function that processes a key/value pair

to generate a set of intermediate key/value pairs

Reduce(k2, list (v2)) → list(v2) – Reduce function merges all values associated with the

same intermediate key

Simple, yet expressive – Real world applications: Word Count, Distributed Grep,

Count of URL Access Frequency , Inverted Index, etc

Page 9: Taming Latency: Case Studies in MapReduce Data Analytics

9 © Copyright 2013 EMC Corporation. All rights reserved.

MapReduce illustrated

1

1 1

1

11

1

1 1

1

Map

1 1

11

1

1 1

1

1

1

Shuffle

3

2

3

2

Reduce

3 32 2

Page 10: Taming Latency: Case Studies in MapReduce Data Analytics

10 © Copyright 2013 EMC Corporation. All rights reserved.

MapReduce Recap: system view Embarrassingly parallel

– Partitioned parallelism in both Map and Reduce phases

Distributed and scalable – Computations distributed across large cluster of commodity machines – Master schedules tasks to workers

Fault tolerant – Reschedule task in case of failure – Materialize task output to disk

Performance Optimized – Combiner function – Locality-aware scheduling – Redundant execution

Page 11: Taming Latency: Case Studies in MapReduce Data Analytics

11 © Copyright 2013 EMC Corporation. All rights reserved.

MR latency mitigation: a systematic way Latency improvement opportunities in aspects from

the whole MR processing stack – Architectural design

▪ HOP – Programming model

▪ S4 – Resource scheduling

▪ Delay scheduling – Dataflow: processing and transmission

▪ Spark, Tenzing, Bolt MR, etc – Data persistence

▪ Stinger

Page 12: Taming Latency: Case Studies in MapReduce Data Analytics

12 © Copyright 2013 EMC Corporation. All rights reserved.

Trade-offs

Latency, sometimes at odds with throughput – Speculative execution

▪ Backup executions of “straggler” tasks decrease per-job latency at the expense of cluster throughput

Trade-off between latency and fault tolerance – Naïve pipelining

▪ Direct output transmission from Mapper to Reducer alleviates latency bottleneck, but hurts fault tolerance

Need to preserve other critical system characteristics

– Throughput, fault tolerance, scalability…

Every good quality is noxious if unmixed — Ralph Waldo Emerson

Page 13: Taming Latency: Case Studies in MapReduce Data Analytics

13 © Copyright 2013 EMC Corporation. All rights reserved.

Case Studies Approach to mitigate latency, from HOP, Tenzing, S4, Spark, Stinger and LUMOS

Page 14: Taming Latency: Case Studies in MapReduce Data Analytics

14 © Copyright 2013 EMC Corporation. All rights reserved.

HOP: Hadoop Online Prototype

A pipelining version of Hadoop from UC Berkeley – “MapReduce Online”, NSDI'10 paper – Open sourced with Apache License 2.0

In HOP’s modified MapReduce architecture, intermediate data is pipelined between operators

HOP preserves the programming interfaces and fault tolerance models of previous MapReduce frameworks

Page 15: Taming Latency: Case Studies in MapReduce Data Analytics

15 © Copyright 2013 EMC Corporation. All rights reserved.

Stock Hadoop: a blocking architecture

Intermediate data produced by each Mapper is pulled by Reducer in its entirety

– Simplified fault tolerance ▪ Data output are materialized

before consumption – Underutilized resource:

▪ Completely decoupled execution between Mapper and Reducer

Page 16: Taming Latency: Case Studies in MapReduce Data Analytics

16 © Copyright 2013 EMC Corporation. All rights reserved.

HOP: from blocking to pipelining

HOP offers a modified MapReduce architecture that allows data to be pipelined between operators

– Improved system utilization and reduced completion times with increased parallelism

– Extends programming model beyond batch processing ▪ Online aggregation

— Allows users to see “early returns” from a job as it is being computed

▪ Continuous queries — Enable applications such as event monitoring and stream processing

Page 17: Taming Latency: Case Studies in MapReduce Data Analytics

17 © Copyright 2013 EMC Corporation. All rights reserved.

Latency decreasing in HOP Challenge: Latency backfire

– Increased job response time resulting from eager pipelining ▪ Eager pipelining prevents use of “combiner” optimization ▪ Reducer may be overloaded by shifted sorting work from Mappers

Solution: Adaptive load moving 1. Buffer the output, with a threshold size in Mapper 2. On filled buffer, apply combiner function, sort and spill

output to disk 3. Spill files are pipelined to reduce tasks adaptively

▪ Accumulated spill files may be further merged

Page 18: Taming Latency: Case Studies in MapReduce Data Analytics

18 © Copyright 2013 EMC Corporation. All rights reserved.

Preserving fault tolerance in HOP Challenges:

– Reducer failure ▪ Make fault tolerance difficult in purely pipelined architecture

– Mapper failure ▪ Limit on the reducer’s ability to merge spill files

Solution: – Materialization

▪ The intermediate data are materialized, retaining fault tolerance in Hadoop – Checkpointing

▪ The reached offset in Mapper input split is bookkept ▪ Only Mapper output produced before the offset is merged by Reducer

Page 19: Taming Latency: Case Studies in MapReduce Data Analytics

19 © Copyright 2013 EMC Corporation. All rights reserved.

Performance evaluation from HOP Some initial performance results disclose that pipelining can

reduce job completion times by up to 25% in some scenarios – Word-count on 10GB input data, 20 map tasks and 20 reduce tasks – CDF of Map and Reduce task completion times for Blocking and

Pipelining, respectively – Pipelining reduces total job runtimes by 19.7%

Page 20: Taming Latency: Case Studies in MapReduce Data Analytics

20 © Copyright 2013 EMC Corporation. All rights reserved.

Tenzing: Hive the Google way

SQL query engine on top of MapReduce for ad hoc data analysis from Google

– “Tenzing A SQL Implementation On The MapReduce Framework”, VLDB'11 paper

– Featured by: ▪ Strong SQL support ▪ Low latency, comparable with parallel databases ▪ Highly scalable and reliable, atop MapReduce ▪ Support heterogeneous backend storage

Page 21: Taming Latency: Case Studies in MapReduce Data Analytics

21 © Copyright 2013 EMC Corporation. All rights reserved.

Low latency approaches in Tenzing MR execution enhancement

– Process pool ▪ Master pool ▪ Worker pool

– Streaming and In-memory Chaining – Sort Avoidance for certain hash based operators

▪ Block Shuffle – Local Execution

SQL Query enhancement – Metadata-aware query plan optimization – Projection and Filtering, Aggregation, Joins, etc

Experimental Query Engine optimization – LLVM query engine

Page 22: Taming Latency: Case Studies in MapReduce Data Analytics

22 © Copyright 2013 EMC Corporation. All rights reserved.

Tenzing performance

“Using this approach, we were able to bring down the latency of the execution of a Tenzing query itself to around 7seconds.”

“There are other bottlenecks in the system however, such as computation of map splits, updating the metadata service, …, etc. which means the typical latency varies between 10 and 20 seconds currently.”

Page 23: Taming Latency: Case Studies in MapReduce Data Analytics

23 © Copyright 2013 EMC Corporation. All rights reserved.

S4: Simple Scalable Streaming System A research project for stream processing in Yahoo!

– Open sourced in Sep, 2009 and entered Apache Incubation Oct 2011

– A general-purpose stream processing engine ▪ With a simple programming interface ▪ Distributed and scalable ▪ Partially fault-tolerant

– Design for use cases different from batch model processing ▪ Infinite data stream ▪ Stream of events that flow into the system at variety data rate ▪ Real-time processing with low latency expected

Page 24: Taming Latency: Case Studies in MapReduce Data Analytics

24 © Copyright 2013 EMC Corporation. All rights reserved.

S4 overview

Data abstraction – Data are streams of key-value,

dispatched and processed by Processing Elements

Design inspired by – Actors model – MapReduce model

▪ key-value based data dispatching TopK, stream processing

Page 25: Taming Latency: Case Studies in MapReduce Data Analytics

25 © Copyright 2013 EMC Corporation. All rights reserved.

Low latency design in S4 Simple programming paradigm that operates on

data streams in real-time Minimize latency by using local memory to avoid

disk I/O bottlenecks – Lossy failover: Partially fault tolerant

Pluggable architecture to select network protocol for data communication

– Communication layer allows data be sent without a guarantee in trade for performance

Page 26: Taming Latency: Case Studies in MapReduce Data Analytics

26 © Copyright 2013 EMC Corporation. All rights reserved.

Spark Research project at UC Berkeley on big data analytics

– “Spark: Cluster Computing with Working Sets”, HotCloud'10

A parallel cluster computing framework – Supports applications with working sets

▪ Iterative algorithm ▪ Interactive data analysis

– Retaining the scalability and fault tolerance of MapReduce

Allow interactive large data analyzing on clusters efficiently, with a general purpose programming language

Page 27: Taming Latency: Case Studies in MapReduce Data Analytics

27 © Copyright 2013 EMC Corporation. All rights reserved.

Latency decreasing in Spark

In Spark, data can be cached in memory explicitly

– The core data abstraction for Spark is RDD, the read-only, partitioned collection of objects

Keeping working set of data in memory can improve performance by an order of magnitude

– Outperform Hadoop by 20 for iterative jobs – Can be used interactively to search a 1 TB dataset

with latencies of 5–7 seconds

Page 28: Taming Latency: Case Studies in MapReduce Data Analytics

28 © Copyright 2013 EMC Corporation. All rights reserved.

Fault tolerance in Spark Lineage

– Lost partitions are recovered by ‘replaying’ the series of transformations used to build the RDD

Checkpointing – To avoid time-consuming recovery,

checkpoint to stable storage will be helpful to applications with ▪ Long lineage graph ▪ Lineage composed of wide dependencies

Page 29: Taming Latency: Case Studies in MapReduce Data Analytics

29 © Copyright 2013 EMC Corporation. All rights reserved.

Stinger Initiative

Enhance Hive with more SQL and improved performance to allow human-time use cases

– Announced in Feb 2013, led by Hortonworks – Effort from community collaboration, with resources from

SAP, Microsoft, Facebook and Hortonworks

Page 30: Taming Latency: Case Studies in MapReduce Data Analytics

30 © Copyright 2013 EMC Corporation. All rights reserved.

Making Apache Hive 100 Times Faster

Stinger’s improvements on HIVE – More SQL

▪ Analytics features, standard SQ aligning, etc – Optimized query execution plans

▪ 45X performance increase for Hive in some early results – Support of new columnar file format

▪ ORCFile, more efficiency and high performance – New runtime framework, Apache Tez

Page 31: Taming Latency: Case Studies in MapReduce Data Analytics

31 © Copyright 2013 EMC Corporation. All rights reserved.

Accelerating data processing by Tez In traditional MapReduce, one SQL query

often results in multiple jobs, which eventually impacts performance

– Latency introduced from launching of jobs – Extra overhead in materializing intermediate

job outputs to the file system

Performance improvements from Tez – With a generalized computing paradigm for

DAG execution, Tez can express any SQL as one single job

– Tez AM, running atop YARN, supports container reuse

Page 32: Taming Latency: Case Studies in MapReduce Data Analytics

32 © Copyright 2013 EMC Corporation. All rights reserved.

LUMOS Project

A real-time, interactive, self-service data cloud platform for big data analytics, from EMC Labs China

LUMOS – guide the data scientists to the big value of big data

Goal: Develop key building blocks for the big data cloud platform

Page 33: Taming Latency: Case Studies in MapReduce Data Analytics

33 © Copyright 2013 EMC Corporation. All rights reserved.

Design principles Real-time analytics

– Low latency MapReduce data processing

Interactive analytics – SQL query interface and visualization

Deep analytics – Advanced and complex statistical and data mining – Predictive analytics

Self-service analytics – Analytics as a service

Page 34: Taming Latency: Case Studies in MapReduce Data Analytics

34 © Copyright 2013 EMC Corporation. All rights reserved.

Building Blocks in LUMOS

Data Process – BoltMR: Flexible and High Performance MapReduce

execution engine

Data Access – SQL2MR: Declarative query interface and optimizer for

MapReduce

Data Service – DMaaS: Data mining analytics service and tools

Page 35: Taming Latency: Case Studies in MapReduce Data Analytics

35 © Copyright 2013 EMC Corporation. All rights reserved.

Bolt MR A flexible, low-latency and high

performance MapReduce implementation

– Improve the overall performance – Reduce latency – Supporting for alternative work load

types ▪ Iterative ▪ Incremental ▪ Online Aggregation and Continuous Query

Flickr credit: http://www.flickr.com/photos/blahflowers/4656725185/

Page 36: Taming Latency: Case Studies in MapReduce Data Analytics

36 © Copyright 2013 EMC Corporation. All rights reserved.

Bolt MR – latency enhancement

Batch mode MapReduce with enhancement on Hadoop:

– Enhanced task resource allocation

– Master/Worker Pool – Flexible data

processing/transmission options

Page 37: Taming Latency: Case Studies in MapReduce Data Analytics

37 © Copyright 2013 EMC Corporation. All rights reserved.

Bolt MR – Performance evaluation • On Container Reuse and Worker

Pool • Lower latency is observed in all the

conducted micro-benchmarks • For the jobs with small input, substantial

improvement ratio is observed

242

209

63

32

Job3

Job Execution time (s) Reuse + Pool Worker Pool Container Reuse Normal

0

500

1000

1500

2000

2500

3000

3500

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101105 109113 117121

TaskInitializingTime TaskProcessingTime

0

500

1000

1500

2000

2500

3000

3500

4000

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

103

106

109

112

115

118

TaskInitializingTime TaskProcessingTime

Page 38: Taming Latency: Case Studies in MapReduce Data Analytics

38 © Copyright 2013 EMC Corporation. All rights reserved.

SQL2MR Problems

– Poor programmability and metadata mgmt of MapReduce ▪ MapReduce application is hard to program ▪ Need to publish data in well-known schemas

– Poor performance of existing MapReduce query translation systems (e.g., Hive, Pig) ▪ Inefficiency (latency on the order of minutes) of sub-optimal MR jobs due

to limited query optimization ability ▪ Poor SQL compatibility and limit language expression power

Our solution – An extensible and powerful SQL-like query language for

complex analytics – Cost-based query execution plan optimization for MR

Page 39: Taming Latency: Case Studies in MapReduce Data Analytics

39 © Copyright 2013 EMC Corporation. All rights reserved.

Query Optimization for MapReduce-Based Big Data Analytics

SQL Query Processor

Efficient MapReduce jobs non-invasively running at existing and future Hadoop stacks

SQL Queries

J1

J3

J2

J4

Cost Estimation

Plan Space Exploration

Schema Info & Statistics Maintenance

Query Parsing

Enumerate the alternative physical plans (i.e., MR jobs)

for the input query

Estimate the execution costs of physical plans and select

the cheapest one

Store and derive the logical and physical properties of both input

and intermediate data

SQL query

Optimal MR jobs

A novel cost-based optimization framework that Learns from the wide spectrum of DB query optimization (>40 years!!) Exploits usage & design properties of MapReduce frameworks

Page 40: Taming Latency: Case Studies in MapReduce Data Analytics

40 © Copyright 2013 EMC Corporation. All rights reserved.

Optimizations from other research/engineering efforts Delay Scheduling

– A scheduler that takes into account both fairness and data locality

Longest Approximate Time to End, LATE – Speculatively execute task based on finish time estimation – Launch speculative task on a fast node

Direct I/O – Read data from local disk if applicable, avoiding inter-process communication costs

from HDFS

Low level optimizations – OS level: Efficient data transfer with sendfile system call – Instruction level: Increased HDFS read/write efficiency via CRC32 support from

SSE4.2 instruction extensions in Intel Nehalem processor

Page 41: Taming Latency: Case Studies in MapReduce Data Analytics

41 © Copyright 2013 EMC Corporation. All rights reserved.

Quick summary • Latency improvement - optimization cross all layers in MapReduce system

– Query engine – SQL query optimization (Tenzing, Stinger, SQL2MR) – Code generation (Tenzing)

– Architectural design – Pipelining (HOP)

– Programming model – Streaming (S4)

– Resource scheduling – Scheduling algorithm optimization (Delay Scheduling, LATE)

– Data processing and transmission – In-Memory (S4, Spark), Process Pool (Tenzing, Bolt MR), Sort Avoidance

(Tenzing), more efficient system call, etc – Data persistence

– Columnar storage (Stinger), Direct I/O

Page 42: Taming Latency: Case Studies in MapReduce Data Analytics

42 © Copyright 2013 EMC Corporation. All rights reserved.

3 Ways to Cope with Latency Lags Bandwidth “3 Ways to Cope with Latency Lags Bandwidth”, from David

Patterson – Caching

▪ Processor caches, file cache, disk cache – Replication

▪ Multiple requests to multiple copies and just use the quickest reply

– Prediction ▪ Branches + Prefetching

Corresponding latency decreasing approach in MapReduce – In-memory cache in Spark – Speculative execution in MapReduce – Pipelining in HOP

Page 43: Taming Latency: Case Studies in MapReduce Data Analytics

43 © Copyright 2013 EMC Corporation. All rights reserved.

Are We There Yet?

Identifying performance bottlenecks, is an iterative process

– Performance impact mitigation on one bottleneck can be followed by the discovery of the next one

– “These 3 already fully deployed, so must find next set of tricks to cope; hard!”

- David Patterson

Page 44: Taming Latency: Case Studies in MapReduce Data Analytics