Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Efficient Elastic Data...

21
University of Pisa Italy PPoPP 2016 - Barcelona Tiziano De Matteis, Gabriele Mencagli KEEP CALM AND REACT WITH FORESIGHT STRATEGIES FOR LOW- LATENCY AND ENERGY-EFFICIENT ELASTIC DATA STREAM PROCESSING

Transcript of Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Efficient Elastic Data...

University of PisaItaly

PPoPP 2016 - Barcelona

Tiziano De Matteis, Gabriele Mencagli

KEEP CALMAND

REACT WITHFORESIGHT

STRATEGIES FOR LOW-LATENCY AND ENERGY-EFFICIENT ELASTIC DATA STREAM PROCESSING

INTRODUCTIONThe recent years have been characterized by an explosion of data streams generated by a variety of sources: social networks, sensors, stock markets...

Data Stream Processing (DaSP) applications : real-time processing of continuous data streams with stringent Quality of Service (QoS) requirements in a very dynamic environment. Requirements:

Parallelism to obtain performance

Elasticity to handle dynamicity

Cost effectiveness

Goal: proposal of latency-aware and energy efficient scaling strategies with predictive capabilities

Stateful Operator

BACKGROUND

Applications are expressed as graphs of operators (vertices) that communicate through streams (edges). We will focus on stateful operators.

In many contexts, the physical input stream conveys tuples belonging to multiple logical substreams. Examples from network monitoring, financial applications, social networks, ...

Require to maintain separated state (e.g. window) for each substream and apply computation on a substream basis.

Source Side

BACKGROUNDParallelized partitioned stateful operator: each state partition owned by an operator replica

○ Splitter distributes tuples using an hash function : →[1:n]

The most used parallel schema, implemented in various DaSP frameworks (e.g. Storm)

○ Merger collects the results from the replicas

REPLICA1

REPLICAn

SPLITTER MERGERinput

streamoutput stream

Scaling strategies will change the operator configuration ( e.g. number of replicas, CPU frequency,...) in order to face all the D-* challenges

DYNAMICITY

○ the arrival rate;

○ keys frequency distribution;

○ processing time per tuple.

SYSTEM

CONTR

OLLER

disturbances

MODEL PREDICTIVE CONTROLModel Predictive Control (MPC) approach: actions are taken by using a model to predict the future system behavior over a limited prediction horizon h.

Optimizer

SystemModel

Disturbance Forecaster

decision variables

System observed through disturbances at each control step . Future values are estimated

A system model is used to compare and evaluate alternative configurations

An optimization problem is solved

The result is a reconfiguration trajectory :

Only the first one is applied

ELASTIC OPERATORThe parallel schema incorporates now the controller

R

R

S M

CONTROLLER

Measured disturbances (for step -1):

○ ( A, A): mean and standard deviation of inter-arrival time per triggering tuple;

○ { k } keys frequency distribution;○ { k } computation time for each key.

Decision variables: u( )= Number of replicas (n) and CPU frequency (f)

System models: Used to predict the values of the QoS variables with a given configuration:

SYSTEM MODELS

Latency (or more formally the response time): we use a Queueing Theory approach. For the control step is expressed as:

To find WQ we model the operator as a G/G/1 queueing system (Kingman):

Feedback mechanism to increase the precision

Waiting Time Processing Time

SYSTEM MODELS

Power : owing to the infinite nature of DaSP computations, minimize the instant power is the main solution to reduce energy consumption

Power at step is proportional to the number of replicas, the CPU frequency and square of supply voltage (depends from f)

Rationale: computation time is inversely proportional to frequency. That is, halving the frequency we will double the computation time but we will use less than half the Power.

This model will be used to compare different operator configurations.

OPTIMIZATION

The MPC-based strategies solve at each step the optimization problem:

EXPERIMENTSOur control strategies have been evaluated on an HFT application over a multicore

HFTR

R

S M

CONTROLLER

Source Consumer

financial quotes

Two different datasets (2836 symbols):

○ a real one (trading day, accelerated 100x)○ a synthetic (random walk arrival rate)

All the dynamicity factors to handle.

fitting on aggregated quotes

Window of 1000 tuples, slide 25 tuples.

EVALUATION

Two control strategies:

○ Lat-Node: resource cost depends on the number of used cores;○ Lat-Power: resource cost depends on the power consumed.

Arrival rate is predicted with Holt-Winter filter. We explicitly consider the case of =0. Control step = 1 second

Strategies evaluated in terms of SASO properties:

○ Stability: no frequent reconfigurations;○ Accuracy: minimize the QoS violations;○ Settling time: find a stable configuration quickly;○ Overshoot: no overestimating the configuration.

Target architecture: dual CPU Intel Sandy Bridge Xeon E5-265016 physical cores with DVFS feature.

STABILITY

Consider Lat-Node and synthetic workload we have:

The switching cost act as a stabilizer.

STABILITY

Considering all the scenarios:

The switching cost reduce the number of reconfigurations. This effect is partially mitigated by increasing the horizon length.

ACCURACYWe detect a QoS violation each time the average latency is higher than a threshold δ (δ=1.5 ms for Synt. WL, δ=7ms for Real WL)

The switching cost allows the strategy to reach a better accuracy. This is partially offset by increasing the horizon length.

OVERSHOOT

We considered the resource consumptions.

The use of the switching cost causes overshoot. This can be mitigated by using a longer horizon

RESOURCE CONSUMPTION

We studied the power consumption (CPU cores) of the Lat-Node and Lat-Power strategies

Average power saving of 18.2% and 16.5%

SETTLINGIn cases of sudden workload challenges, the strategy should be able to reach rapidly the right configuration

The switching cost reduces the average reconfiguration amplitude. Better settling time can be achieved with longer prediction horizons.

OTHER APPROACHESWe compare our approach with a peak load configuration and two reactive strategies:

○ one based on policy rules;○ an algorithm developed for IBM SPL, not intended for latency

# Reconf. QoS Violations # Replicas

Rule-based 47.42 76 6.89

SPL-strategy 40.18 230 4.63

Lat-Node 11 30 9.97

Peak-load - 15 12

Our approach has fewer reconfiguration with fewer violations (SPL strategy is throughput oriented)

CONCLUSIONSIn this work we have studied and implemented strategies for elastic DaSP operators:

○ predictive approach by using MPC methods;○ take into account power consumption, while providing latency

guarantees;○ our strategies exhibit good stability, accuracy and lower resources

consumption;

Future works:

○ extend the work on distributed memory architectures;○ integrate the strategies in a complete graph context (not only an

operator)

ADDITIONAL REFERENCE AND ATTRIBUTIONS

References:

○ Artifact of the paper available at: https://github.com/tizianodem/elastic-hft

○ Application was developed in Fastflow, a C++ parallel programming framework for multicores: http://calvados.di.unipi.it/

○ For energy statistics and CPU frequency scaling we used the Mammut library available at: https://github.com/DanieleDeSensi/Mammut

Attribution

○ Icons used in slide 2 and 4 were designed by Freepik from www.flaticon.com

Thank you!Questions?