Architecture of Big Data Solutions

39
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Architecture of Big Data Solutions Guido Schmutz Frankfurt, 13.12.2017 @ gschmutz guidoschmutz.wordpress.com

Transcript of Architecture of Big Data Solutions

Page 1: Architecture of Big Data Solutions

BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH

Architecture of Big Data SolutionsGuido SchmutzFrankfurt, 13.12.2017

@gschmutz guidoschmutz.wordpress.com

Page 2: Architecture of Big Data Solutions

Guido Schmutz

Working at Trivadis for more than 20 yearsOracle ACE Director for Fusion Middleware and SOAConsultant, Trainer Software Architect for Java, Oracle, SOA andBig Data / Fast DataHead of Trivadis Architecture BoardTechnology Manager @ Trivadis

More than 30 years of software development experience

Contact: [email protected]: http://guidoschmutz.wordpress.comSlideshare: http://www.slideshare.net/gschmutzTwitter: gschmutz

Architektur of Big Data Solutions

Page 3: Architecture of Big Data Solutions

Agenda

1. Introduction2. Big Data & Fast Data Reference Architectures3. Continuous Streaming Data Ingestion4. Big Data & Cloud5. Microservices Architecture6. Big Data Ecosystem – many choices sorted!

Architektur of Big Data Solutions

Page 4: Architecture of Big Data Solutions

Introduction

Architektur of Big Data Solutions

Page 5: Architecture of Big Data Solutions

Big Data Definition (4 Vs)

+Timetoaction?– BigData+Real-Time=StreamProcessing

CharacteristicsofBigData:ItsVolume,VelocityandVarietyincombination

Architektur of Big Data Solutions

Page 6: Architecture of Big Data Solutions

Architektur von Big Data Lösungen

Enterprise Data Warehouse

ETL / Stored Procedures

Data Marts / AggregationsLocation

Social

Clickstream

Segmentation & ChurnAnalysis

BI Tools

Marketing Offers

Billing &Ordering

CRM / Profile

MarketingCampaigns

Architektur of Big Data Solutions

Page 7: Architecture of Big Data Solutions

Traditional Flow Diagram - Challenges

Enterprise Data Warehouse

ETL / Stored Procedures

Data Marts / AggregationsLocation

Social

Clickstream

Segmentation & ChurnAnalysis

BI Tools

Marketing Offers

Billing &Ordering

CRM / Profile

MarketingCampaigns

Limited Processing

Power

Does not model easily to traditional

database schema

Limited Processing

Power

Storage Scaling

very expensive

Based on sample /

limited data

Loss in Fidelity

Other / New Data Sources

High Voume

and Velocity

Architektur of Big Data Solutions

Page 8: Architecture of Big Data Solutions

Big Data to the rescue? Why is a structuring / architecture important?

Architektur of Big Data Solutions

Page 9: Architecture of Big Data Solutions

Why talk about Big Data Architectures?

Choosing the right architecture is key for any (big data) project

Big Data is still quite a rather young field and therefore a “moving target”

no standard architectures available which have been used for years

In the past years, some architectures and best practices have evolved

Know your use cases before choosing your architecture / technologies

To have a reference architecture in place helps in choosing the right/matching technologies

Architektur of Big Data Solutions

Page 10: Architecture of Big Data Solutions

Big Data & Fast Data Reference Architectures

Architektur of Big Data Solutions

Page 11: Architecture of Big Data Solutions

Hadoop ClusterdHadoop Cluster

Big Data Cluster

Big Data Architecture

BITools

Enterprise Data Warehouse

Billing &Ordering

CRM / Profile

MarketingCampaigns

File Import / SQL Import

SQL

Search/Explore

Online&MobileApps

Search

• MachineLearning• GraphAlgorithms• NaturalLanguageProcessing

Parallel Processing

Storage

Storage

Raw

Ref

ined

Results

Architektur of Big Data Solutions

Page 12: Architecture of Big Data Solutions

Hadoop ClusterdHadoop Cluster

Big Data Cluster

Big Data Architecture - Hadoop

BITools

Enterprise Data Warehouse

Billing &Ordering

CRM / Profile

MarketingCampaigns

File Import / SQL Import

SQL

Search/Explore

Online&MobileApps

Search

• MachineLearning• GraphAlgorithms• NaturalLanguageProcessing

Parallel Processing

Storage

Storage

Raw

Ref

ined

Results

Architektur of Big Data Solutions

Page 13: Architecture of Big Data Solutions

Hadoop ClusterdHadoop Cluster

Big Data Cluster

Big Data Architecture - Spark

BITools

Enterprise Data Warehouse

Billing &Ordering

CRM / Profile

MarketingCampaigns

File Import / SQL Import

SQL

Search/Explore

Online&MobileApps

Search

• MachineLearning• GraphAlgorithms• NaturalLanguageProcessing

Parallel Processing

Storage

Storage

Raw

Ref

ined

Results

Architektur of Big Data Solutions

Page 14: Architecture of Big Data Solutions

Event HubEvent

Hub

Hadoop ClusterdHadoop Cluster

Big Data Cluster

Event Hub for handling streaming data

BITools

Enterprise Data Warehouse

Event Hub

SQL

Search/Explore

Online&MobileApps

Search

Data Flow • MachineLearning• GraphAlgorithms• NaturalLanguageProcessing

Parallel Processing

Storage

Storage

Raw

Ref

ined

Results

Architektur of Big Data Solutions

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

WeatherData

Page 15: Architecture of Big Data Solutions

Event HubEvent

Hub

Hadoop ClusterdHadoop Cluster

Big Data Cluster

Event Hub for handling streaming data

BITools

Enterprise Data Warehouse

Event Hub

SQL

Search/Explore

Online&MobileApps

Search

Data Flow • MachineLearning• GraphAlgorithms• NaturalLanguageProcessing

Parallel Processing

Storage

Storage

Raw

Ref

ined

Results

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

WeatherData

Architektur of Big Data Solutions

Page 16: Architecture of Big Data Solutions

Event HubEvent

Hub

Hadoop ClusterdHadoop Cluster

Big Data Cluster

Event Hub for handling streaming data

BITools

Enterprise Data Warehouse

Event Hub

SQL

Search/Explore

Online&MobileApps

Search

Data Flow • MachineLearning• GraphAlgorithms• NaturalLanguageProcessing

Parallel Processing

Storage

Storage

Raw

Ref

ined

Results

Architektur of Big Data Solutions

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

WeatherData

highlatency

Page 17: Architecture of Big Data Solutions

“Data at Rest” vs. “Data in Motion”

Architektur of Big Data Solutions

Data at Rest Data in Motion

Page 18: Architecture of Big Data Solutions

Event HubEvent

Hub

Hadoop ClusterdHadoop Cluster

Stream Processing Cluster

Streaming Analytics Architecture

BITools

Enterprise Data Warehouse

Event Hub

Search/Explore

Online&MobileApps

Search

Data Flow Data Flow

Results

• LowLatencyProcessing• Alerting• ”Real-Time”Dashboard

Stream Analytics

Reference /Models

Dashboard

Architektur of Big Data Solutions

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

WeatherData

Page 19: Architecture of Big Data Solutions

Event HubEvent

Hub

Hadoop ClusterdHadoop Cluster

Stream Processing Cluster

BITools

Enterprise Data Warehouse

Event Hub

Search/Explore

Online&MobileApps

Search

Data Flow Data Flow

Results

• LowLatencyProcessing• Alerting• ”Real-Time”Dashboard

Stream Analytics

Reference /Models

Dashboard

Architektur of Big Data Solutions

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

WeatherData

Streaming Analytics Architecture – Open Source

Page 20: Architecture of Big Data Solutions

Event HubEvent

Hub

Hadoop ClusterdHadoop Cluster

Stream Processing Cluster

Streaming Analytics Architecture

BITools

Enterprise Data Warehouse

Event Hub

Search/Explore

Online&MobileApps

Search

Data Flow Data Flow

Results

• LowLatencyProcessing• Alerting• ”Real-Time”Dashboard

Stream Analytics

Reference /Models

Dashboard

Architektur of Big Data Solutions

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

WeatherData

lowlatencywithoutkeepingrawdata/events

Page 21: Architecture of Big Data Solutions

Hadoop ClusterdHadoop Cluster

Event Processing Cluster

Keep raw event data

BITools

Enterprise Data Warehouse

Search/Explore

Online&MobileApps

Search

ResultsStream Analytics

Reference /Models

Dashboard

Hadoop ClusterdHadoop Cluster

Big Data Cluster

Event HubEvent

HubEvent Hub

File Import / SQL Import

Parallel Processing

Storage

Storage

Raw

Ref

ined

Results

Architektur of Big Data Solutions

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

WeatherData

Page 22: Architecture of Big Data Solutions

“Lambda Architecture” for Big Data

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

Event HubEvent

HubEvent Hub

SQL

Search

BITools

Enterprise Data Warehouse

Search/Explore

Online&MobileApps

File Import / SQL Import

WeatherData

Hadoop ClusterdHadoop Cluster

Event Processing Cluster

ResultsStream Analytics

Reference /Models

Dashboard

Hadoop ClusterdHadoop Cluster

Big Data Cluster

Parallel Processing

Storage

Storage

Raw

Ref

ined

Results

Architektur of Big Data Solutions

Page 23: Architecture of Big Data Solutions

“Kappa Architecture” for Big Data

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

SQL

Search

BITools

Enterprise Data Warehouse

Search/Explore

Online&MobileApps

File Import / SQL Import

WeatherData

Hadoop ClusterdHadoop Cluster

Event Processing Cluster

ResultsStream Analytics

Reference /Models

Dashboard

Hadoop ClusterdHadoop Cluster

Big Data Cluster

Event HubEvent

HubEvent Hub

Parallel Processing

Storage

Storage

Raw

Ref

ined

Results

Architektur of Big Data Solutions

Page 24: Architecture of Big Data Solutions

Hadoop ClusterdHadoop ClusterBig Data Cluster

“Unified Architecture” for Big Data

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

Batch Analytics

Streaming Analytics

Stream AnalyticsNoSQL

Reference /Models

SQL

Search

Dashboard

BITools

Enterprise Data Warehouse

Search/Explore

Online&MobileApps

File Import / SQL Import

WeatherData

Event HubEvent

HubEvent Hub

Parallel Processing

Storage

Storage

Raw

Ref

ined

Results

Architektur of Big Data Solutions

Page 25: Architecture of Big Data Solutions

Continuous Streaming Data Ingestion

Architektur of Big Data Solutions

Page 26: Architecture of Big Data Solutions

Hadoop ClusterdHadoop ClusterBig Data Cluster

Continuous Data Ingestion

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

Batch Analytics

Streaming Analytics

Stream AnalyticsNoSQL

Reference /Models

SQL

Search

Dashboard

BITools

Enterprise Data Warehouse

Search/Explore

Online&MobileApps

File Import / SQL Import

WeatherData

Event HubEvent

HubEvent Hub

Parallel Processing

Storage

Storage

Raw

Ref

ined

Results

Architektur of Big Data Solutions

Page 27: Architecture of Big Data Solutions

Continuous Streaming Data Ingestion

DBSourceBigDataLog

StreamProcessing

IoT Sensor

EventHub

Topic

Topic

REST

Topic

IoT GW

CDCGW

Conn

ect

CDC

DBSource

Log CDC

Native

IoT Sensor

IoT Sensor

31

DataflowGW

Topic

Topic

Queue

MessageGW

Topic

DataflowGW

Dataflow

TopicRE

ST31FileSourceLog

Log

Log

Social

Native

Topic

Topic

Architektur of Big Data Solutions

Page 28: Architecture of Big Data Solutions

Continuous Streaming Data Ingestion

Architektur of Big Data Solutions

SQL Polling

Change Data Capture (CDC)

File Polling

File Stream (File Tailing)

File Stream (Appender)

Sensor Stream

Page 29: Architecture of Big Data Solutions

Continuous Streaming Data Ingestion

DBSourceBigDataLog

StreamProcessing

IoT Sensor

EventHub

Topic

Topic

REST

Topic

IoT GW

CDCGW

Conn

ect

CDC

DBSource

Log CDC

Native

IoT Sensor

33

DataflowGW

Topic

Topic

Queue

MessageGW

Topic

DataflowGW

Dataflow

TopicRE

ST33FileSourceLog

Log

Log

Social

Native

Topic

Topic

Architektur of Big Data Solutions

Page 30: Architecture of Big Data Solutions

Big Data & Cloud

Architektur of Big Data Solutions

Page 31: Architecture of Big Data Solutions

Data Locality vs. Compute/Storage Separation

Data Local Compute Separate Compute and Storage

Worker #1

Disk

Processing

Master Node

Worker #2

Disk

Processing

Worker #3

Disk

Processing

Network

Storage

Disk Disk Disk

Compute #1

Processing

Compute #2

Processing

Compute #3

Processing

Network

Master Node

Network

Separation of compute and storage – the fundamental difference• store data in Object

Storage instead of DFS

• bring up Compute nodes only for data processing

• multiple workloads on separate clusters can access same data

Architektur of Big Data Solutions

Page 32: Architecture of Big Data Solutions

A new way to Manage Big Data

Big Data Traditional Assumptions

Bare-metal

Data Locality

HDFS on local disks

Big DataA New Approach

Containers and VMs

Compute and storage separation

Shared storage

Benefits and Value

Big-Data-as-a-Service

Agility and cost savings

Faster time-to-insights

Architektur of Big Data Solutions

Page 33: Architecture of Big Data Solutions

Hadoop ClusterdHadoop ClusterBig Data Cluster

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

Batch Analytics

Streaming Analytics

Stream AnalyticsNoSQL

Reference /Models

SQL

Search

Dashboard

BITools

Enterprise Data Warehouse

Search/Explore

Online&MobileApps

File Import / SQL Import

WeatherData

Event HubEvent

HubEvent Hub

Parallel Processing

Storage

Storage

Raw

Ref

ined

Results

Architektur of Big Data Solutions

Big Data & Cloud - Amazon WebServices (AWS)

Page 34: Architecture of Big Data Solutions

Microservices Architecture

Architektur of Big Data Solutions

Page 35: Architecture of Big Data Solutions

Hadoop ClusterdHadoop ClusterBig Data Cluster

Asynchronous Microservice Architecture

Location

Social

Clickstream

Sensor Data

Billing &Ordering

CRM / Profile

MarketingCampaigns

CallCenter

MobileApps

SQL

Search

BITools

Enterprise Data Warehouse

Search/Explore

Online&MobileApps

File Import / SQL Import

WeatherData

Event Hub

Parallel Processing

Storage

Storage

Raw

Ref

ined

Results

Microservice Cluster

Microservice State

{}

API

Stream Analytics Cluster

StreamProcessor

State

{}

API

EventStream

EventStream

Service

Architektur of Big Data Solutions

Page 36: Architecture of Big Data Solutions

Big Data Ecosystem – many choices sorted!

Architektur of Big Data Solutions

Page 37: Architecture of Big Data Solutions

Big Data Ecosystem – many choices sorted!

Architektur of Big Data Solutions

Page 38: Architecture of Big Data Solutions

Big Data Ecosystem – many choices sorted!

Architektur of Big Data Solutions

Page 39: Architecture of Big Data Solutions

Guido SchmutzTechnology Manager

[email protected]

Architektur of Big Data Solutions