SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

32
Telefónica I+D SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D

Transcript of SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Page 1: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

SAMSON PlatformArchitecture Streaming Big Data

TELEFÓNICA I+D

Page 2: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Index

SAMSON Platform

File-System

Streaming MapReduce

Eco-system

Architecture

0102030405

Page 3: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

SAMSON Platform

01

Page 4: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

Overview

Samson is a distributed processing engine especially designed for efficient analytics of stream-processing.

Internal distributed file-system optimized for shared data processing

Provides an extension of the MapReduce framework• More efficient MapReduce

• Joins

Streaming MapReduce that allows for the incremental processing of data feeds

Uses existing BigData Storage solutions such as Apache HDFS or MongoDB for fetching and storing data

Built to be deployed on Ubuntu, Redhat Linux and in virtual machines

4

Page 5: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

Samson 0.6 DEB Samson 0.6 RPM User guide Available for VM

Page 6: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

SAMSON Platform Architecture

6

Page 7: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

Key Platform Components

File-system

Streaming MapReduce

Eco-system

Architecture

7

Page 8: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

File-system

02

Page 9: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

HD

Cores

HD HD HD

Distributed big-data platform for high-performance Processing over unbounded streams of data

SAMSON distributed file - system

MapReduce for streamed processing

Cores Cores Cores

Page 10: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

SAMSON distributed file - system

Apache Hadoop HDFS SAMSON

Data is uniformly distributed Binary data is distributed by key

Large heterogeneous clustersNon uniform datasets

Homogenous clusterUniform data-sets

Maximum resource usage Efficient accumulation operationsEfficient JOIN operations

Page 11: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

We periodically receive a set of documents.We want to compute the accumulated word-count each time we receive an update.

First input

Second input

Third input

Reduce

Reduce

Reduce

MapReduce

6

MapReduce

12

MapReduce

18

Redistribution

6

Redistribution

6

Redistribution

6

SAMSON distributed file - system

Page 12: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

Streaming MapReduce

03

Page 13: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

SAMSON

SAMSON

SAMSON

DELILAH

4 Gb

Upload Download

Run operations

Page 14: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

SAMSON

SAMSON

SAMSON

DELILAH

Upload new operations

& data types

Run operations

Open API for 3rd party developers

Page 15: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

SAMSON

SAMSON

SAMSON

Page 16: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

Operation

Operation

Operation

SAMSON

SAMSON

SAMSON

Map

Page 17: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

Operation

Operation

Operation

SAMSON

SAMSON

SAMSON

StateInput Output

Reduce

Page 18: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

Eco-system

04

Page 19: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D 19

Top level view…

samsonPush

samsonPush

samsonPush

delilah

samsonPop

samsonClient

delilah

samsonPop

Console-based client • Upload data• Download data• Run commands• Platform monitor

samsonClient

delilah samsonPush

samsonPop

samsonClient

Binaries to stream data into and out of SAMSON

C++ library to develop new plugins

Examples: • samsonPush• samsonPop

module module

modulemodule

module

module

3rd Party C++ shared library

• New data types• New operations

Tools provided for simplified development!!

Page 20: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D 20

Delilah clientdelilah

Page 21: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D 21

SAMSON Module example…

Module simple_mobility{ title "Simple mobility example" author "Andreu Urruela" version "0.1.1"}

data UserArea{ system.String name; system.UInt x; system.UInt y; system.UInt radius;}

data Position{ system.UInt x; system.UInt y;

system.TimeUnix time;}

parser parser_cdrs{ out system.UInt simple_mobility.Position

helpLine "Parse input CDRs to get user-position"}

class parser_cdrs : public samson::system::SimpleParser {

std::vector<char*> words; // Vector used to store words parsed at each line

void parseLine( char * line , samson::KVWriter *writer ) { // Split line in words split_in_words( line, words );

// Expected format USER_ID CDR X Y time

if( words.size() < 5 ) return; // No content for a valid instruction

if( strcmp( words[1] , "CDR" ) != 0 ) return; // Non valid format

// Set the key key.value = atoll( words[0] );

// Set the position value.set( atoll( words[2] ) , atoll( words[3] ) , atoll( words[4] ) );

// Emit the key-value writer->emit( 0 , &key, &value ); }

};

module

Page 22: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D 22

SAMSON ecosystem tools

Tool Description

samsonModuleBootstrap Create necessary files to start developing a new module

samsonModuleParser Create .h .cpp files from a description file

samsonCat Visualize binary content of SAMSON

samsonSetup Edit setup of a SAMSON system

samsonStarter Setup a new SAMSON cluster

samsonData Show a transactional log for debugging

samsonPush Push data from STDIN to a particular queue

samsonPop Pop data from a particular queue to STDOUT

samsonLocal Emmulate a local cluster for fast development of modules

Page 23: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

File system Stream MapReduce Ecosystem Architecture Demo

Page 24: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D

Architecture

05

Page 25: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D 25

Communication Protocols…

delilah

Platform messagesFlexibilityBack compatibility

Data serializationMaximum data compression ( no field separator )Best for fast-sequential processingEasy job distributionProprietary serialization format

MonitoringNo recompilation neededBest tools for querying ( XPATH )

Goal Solution Why ?

Page 26: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D26

Worker

Disk Manager

Memory Manager

Process Manager

--- cores --

Runtime engine and notification system

Engine library

Independent development

NetworkManager

Page 27: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D 27

Disk Manager // Network Manager

• Controller to access local disk and network connections• Asynchronus notifications using engine notification system• If required multiple threads are used

Memory Manager: ( our retain-release model ….. similar to Objective-C )

• Simple system to control memory usage• Used to optimize memory allocation when under heavy load

Process Manager ( similar to Apple’s Grand Central dispatch library )

• System to control independent “heavy” task to be executed• Automatic creation / destruction of threads• Optional “fork” mode with shared-memory system to get output

Runtime Engine & Notification system

• Inspired in message-passing system implemented in Objective-C• Single loop to run all state-update operations• Thread protection to interact with Disk/Network/Memory/Process

Managers

Engine library

Page 28: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D28

Worker

Engine library

Independent development

multi-core

Block ManagerBlock ManagerDisk – Memory

balancer

Disk Manager

Memory Manager

Process Manager

--- cores --

Runtime engine and notification system

NetworkManager

Page 29: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D 29

Maintains a reference of all blocks of data contained in a Worker ( in disk or memory )

It keeps a sorted list based on when they will be used ( future operations )• Low priority blocks are flushed to disk first• High priority blocks are loaded from disk first

Connected to the Disk Manager inside the engine using the EngineNotificationSystem

Block Manager

Block Manager

Schedule write operationsTo DiskManager

Schedule read operationsTo DiskManager

Important: Since the order of blocks changes continuously based on the scheduling of new processing operations, the Block Manager is made aware of the new order and is able to react accordingly.

Page 30: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D30

Worker

Engine library

Independent development

multi-core

Block ManagerBlock ManagerDisk – Memory

balancer

Queues Manager

txt_cdrs

cdrs

Stream Operations Manager

Operation A

Operation BInput data

usersOperation C

pri

ori

ty

Disk Manager

Memory Manager

Process Manager

--- cores --

Runtime engine and notification system

NetworkManager

Page 31: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.

Telefónica I+D 31

Contains reference to all the blocks contained in queues and stream operationsBoth systems are connected to Block Manager to inform about the priority of blocks

Stream Operation Manager is connected with ProcessManager to schedule 3rd party operations at Engine Subsystem

Queue & Stream Operations Manager

Queues Manager

txt_cdrs

cdrs

Stream Operations Manager

Operation A

Operation Busers

Operation C

pri

ori

ty

Page 32: SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D.