Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012...

61
Bartek Płotka Prometheus at Scale Edinburgh, 22th October 2018 github.com/improbable-eng/thanos @bwplotka

Transcript of Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012...

Page 1: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

Bartek Płotka

Prometheus at Scale

Edinburgh, 22th October 2018

github.com/improbable-eng/thanos

@bwplotka

Page 2: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

Bartek PłotkaSoftware Engineer

[email protected] @bwplotka

Page 3: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

310

Founded:

Games in Development:

+19

2012

Employees:

"Improbable’s platform, SpatialOS, is designed to let anyone build massive simulations, running in the cloud: imagine Minecraft with thousands of players in the same space or researchers creating simulated cities to model the behaviour of millions. Its ultimate goal: to create totally immersive, persistent virtual worlds."

- WIRED, May 2017

Page 4: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Agenda

Prometheus

Page 5: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Agenda

Prometheus

Prometheus Prometheus

Prometheus Prometheus

Prometheus

Page 6: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Agenda

Prometheus

Prometheus Prometheus

Prometheus Prometheus

Prometheus

Thanos

Page 7: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Agenda

Prometheus

Prometheus Prometheus

Prometheus Prometheus

Prometheus

Thanos+

++

++

Page 8: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Agenda

Prometheus

Prometheus Prometheus

Prometheus Prometheus

Prometheus

Thanos+

++

++

Thanos + Open Source

Page 9: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

Prometheushttps://github.com/prometheus/prometheus

Page 10: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Prometheus

Query Engine

Scrape EngineCompactor

Rule & Alert Engine

Prometheus

Service XService X

Services

/metrics every 15s

HTTP Query API

Grafana Alertmanager

Local storage

Page 11: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Cluster

Single Prometheus

Prometheus

Grafana

Alertmanager

Workloads you want

to monitor

Page 12: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Cluster

Single Prometheus

Prometheus

Grafana

Alertmanager

Workloads you want

to monitor

200 000 samples / sec = 1 CPU, 13 GB RAM = 3 mln series with 15s scrape interval.

Page 13: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Cluster

High Availability

Prometheus

Grafana

Alertmanager

Workloads you want

to monitor

Prometheus Prometheus

Page 14: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Cluster 1

Globally distributed clusters

Cluster 2

Prometheus

Cluster n

Cluster n+1

Prometheus...

Page 15: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Problem: Global View

Prometheus Prometheus

sum(rate(go_memstats_alloc_bytes_total[1m])) ?

source = “a” source = “b”

Page 16: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

go_memstats_alloc_bytes_total{..., replica=A)

Problem: Global View + HA

Prometheus Prometheus

go_memstats_alloc_bytes_total{..., replica=B)

go_memstats_alloc_bytes_total{...} [deduplicated]

Page 17: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Problem: Metric retention

Page 18: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Problem: Metric retention

SSD

Prometheus

PrometheusRemote write

Page 19: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Thanos

Goals

● Have a global view● Have a HA in place● Increase retention

Page 20: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

Global View

See everything from a single place!

Page 21: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

SSD

Prometheus

PrometheusTargets

Page 22: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

SSD

Sidecar

Prometheus SidecarTargets

gRPC (Store API)

Page 23: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Store API

service Store {

rpc Series(SeriesRequest) returns (stream SeriesResponse);

rpc LabelNames(LabelNamesRequest) returns (LabelNamesResponse);

rpc LabelValues(LabelValuesRequest) returns (LabelValuesResponse);

}

message SeriesRequest {

int64 min_time = 1;

int64 max_time = 2;

repeated LabelMatcher matchers = 3;

}

Sidecar

Prometheus

remote read

Store API

Page 24: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

SSD

Querier

Prometheus Sidecar

Querier

Store API

Targets

HTTP Query API

Page 25: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

SSD

Global View

Prometheus Sidecar

Querier

Targets

SSD

SidecarTargets

Prometheus

Merge

Store API

Page 26: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

SSD

Global View + Availability

Prometheus SidecarTargets

SSD

Sidecar

Targets

Prometheus

SSD

Sidecar Prometheus

“replica”:”A”

“replica”:”B”

QuerierMerge

Deduplicate

Store API

Page 27: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Thanos

Goals

● Have a global view ✓● Have a HA in place ✓

Prometheus Sidecar

SSD

Sidecar PrometheusSidecar Prometheus

Querier

Page 28: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Global View + Availability

Page 29: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Global View + Availability

Page 30: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

Historical Metrics

What exactly happened X months ago?

Page 31: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Prometheus

Query Engine

Scrape EngineCompactor

Rule & Alert Engine

Prometheus

TSDB

Page 32: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

TSDB Layout

Block 4Block 3Block 1

chunks chunks

chunks chunks

index

T-10hT-16h T-4h T-2h T

Page 33: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

SSD

Data saving

Prometheus SidecarTargets

Object Storage

Blocks Blocks

Block

--gcs.bucket=...

Page 34: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

SSD

Data saving

Prometheus SidecarTargets

Object Storage

Blocks Blocks

Block

--storage.tsdb.max-block-duration=2h --storage.tsdb.retention=12h

Page 35: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Store Gateway

Object Storage

BlocksCache

Store Gateway

Querier

Store API

Page 36: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Object Storage

Blocks

Ruler

Ruler

Querier

Store APIQuery

SSD Blocks

Page 37: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

Compaction

Density matters

Page 38: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Compaction

Object Storage

BlocksDisk

Compactor

Block

BlocksBlock

Page 39: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

Downsampling

Page 40: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Downsampling

Raw: 16 bytes/sample

Compressed: 1.07 bytes/sample

Page 41: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Downsampling

Decompressing one sample takes 10-40 nanoseconds.

Page 42: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Downsampling

Decompressing one sample takes 10-40 nanoseconds.

Decompressing 1000 series @ 30s scrape interval for 1 year data takes 10-40 seconds alone.

Plus your actual computation over all those samples, e.g. rate()

Page 43: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Downsampling

BlockRAW

Block@ 5m

Block@ 1h

10x 12x

Page 44: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Thanos

Goals

● Have a global view ✓● Have a HA in place ✓● Increase retention ✓

Page 45: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Prometheus

Query Engine

Scrape EngineCompactor

Rule & Alert Engine

Prometheus

Page 46: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Thanos

Scrape EngineCompactor

Rule & Alert Engine

Thanos QuerierThanos Querier

Thanos Querier

Page 47: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Thanos

Compactor

Rule & Alert Engine

Thanos QuerierThanos Querier

Thanos Querier

SSD

Prometheus Sidecar

SSD

Prometheus Sidecar

SSD

Prometheus Sidecar

Page 48: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Thanos

Compactor

Thanos QuerierThanos Querier

Thanos Querier

SSD

Prometheus Sidecar

SSD

Prometheus Sidecar

SSD

Prometheus Sidecar

Thanos RulerThanos Ruler

Page 49: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Thanos

Thanos RulerThanos Ruler

Thanos QuerierThanos Querier

Thanos Querier

SSD

Prometheus Sidecar

SSD

Prometheus Sidecar

SSD

Prometheus SidecarGlobal Compactor

Page 50: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Thanos

Store GatewayStore

Gateway

Object StorageSSD

Prometheus Sidecar

SSD

Prometheus Sidecar

SSD

Prometheus Sidecar

Thanos RulerThanos Ruler

Global Compactor

Thanos QuerierThanos Querier

Thanos Querier

Page 51: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Example Deployment

Cluster 1

Cluster 2

+

Cluster n

Cluster n+1

+...

+

Monitoring Cluster

Grafana

Alertmanager

Bucket

Compactor

Querier Querier

Querier

Ruler Store

Statically configured / global discovery

+

++

Page 52: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

Open Source

&

Thanos

Page 53: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

Open Source:

It is worth it!

Page 54: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

Open source from the very beginning

Page 55: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

Be flexible

while staying focused

Page 56: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

Stay focused: Injection API

Source: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#<file_sd_config>

Page 57: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

Less magic == better

Page 58: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Avoid magic: Gossip example

● Easy to misconfigure

● Hard to debug

● Difficult cross cluster setup

● Over-complicated for Thanos needs

Page 59: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Thanos

Goals

● Have a global view ✓● Have a HA in place ✓● Increase retention ✓● Join effort with community ✓

Page 60: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

@bwplotka

Summary

● Start small with just single Prometheus.

● Extend your setup gradually.○ Just global querier.○ Federated global querier.○ Object storage.

● Model your Thanos deployment.

Generally: Share early, keep project focused and simple.

Page 61: Prometheus at Scale - Linux Foundation Events · 310 Founded: Games in Development: +19 2012 Employees: "Improbable’s platform, SpatialOS, is designed to let anyone build massive

github.com/improbable-eng/thanos

Bartek Płotka

@bwplotka

improbable.io

...psst, join our slack workspace for more info!

Credits:● Percona blog post about Prometheus 2 perf ● Emojis designed by Freepik from Flaticon