Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample...

23
Time Series in Prometheus Fabian Reinartz Engineer, SoundCloud Ltd.

Transcript of Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample...

Page 1: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

Time Series in PrometheusFabian Reinartz – Engineer, SoundCloud Ltd.

Page 2: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

prometheus.io

Page 3: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec
Page 4: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

...http_requests_total{status="200",method="GET"} @1434317560938 94355http_requests_total{status="200",method="GET"} @1434317561287 94934http_requests_total{status="200",method="GET"} @1434317562344 96483http_requests_total{status="404",method="GET"} @1434317560938 38473http_requests_total{status="404",method="GET"} @1434317561249 38544http_requests_total{status="404",method="GET"} @1434317562588 38663http_requests_total{status="200",method="POST"} @1434317560885 4748http_requests_total{status="200",method="POST"} @1434317561483 4795http_requests_total{status="200",method="POST"} @1434317562589 4833http_requests_total{status="404",method="POST"} @1434317560939 122...

Prometheus Metrics

Metric name Labels Timestamp Sample Value

Page 5: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

● 1 million time series● 10 second sample resolution● 64bit timestamp + 64bit value

Requirements

100,000 samples/sec

Page 6: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

time [~weeks]

series[~millions]

Writes

The Fundamental ProblemOrthogonal write and read patterns.

Reads

Page 7: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

...http_requests_total{status="200",method="GET"} @1434317560938 94355http_requests_total{status="200",method="GET"} @1434317561287 94934http_requests_total{status="200",method="GET"} @1434317562344 96483http_requests_total{status="404",method="GET"} @1434317560938 38473http_requests_total{status="404",method="GET"} @1434317561249 38544http_requests_total{status="404",method="GET"} @1434317562588 38663http_requests_total{status="200",method="POST"} @1434317560885 4748http_requests_total{status="200",method="POST"} @1434317561483 4795http_requests_total{status="200",method="POST"} @1434317562589 4833http_requests_total{status="404",method="POST"} @1434317560939 122...

Prometheus MetricsKey-Value store (with BigTable semantics) seems suitable.

Metric name Labels Timestamp Sample Value

VALUEKEY

Page 8: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec
Page 9: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

Ingestion

PromQL

Storagein-memory data

append(series, time, value)

series iteratorsHDD / SSD

LevelDBEncode

Decode

Compress Decompress

Page 10: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec
Page 11: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

http_requests_total{status="200",method="GET"}

Prometheus Metrics

Metric name Labels

{__name__="http_requests_total",status="200",method="GET"}

Labels

Page 12: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

Prometheus MetricsLearning the hard way

__name__ = http_requests_totalstatus = 200method = GET

fnv(sort( )

fnv( __name__ = http_requests_total )fnv( status = 200 )fnv( method = GET )

⊕⊕

Page 13: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

1KB chunks

chunk in memory[complete and immutable]

head chunk[incomplete]

SampleIngestion

append(series, time, value)

memory

disk

evictable chunks (LRU)

chunk on disk[complete and immutable]

PromQLseries iterator

one file per time series

series hash:

Page 14: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

Series maintenance

memory

disk

older than retention time

Page 15: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

Chunk preloading

memory

disk

PromQL

series iterator

Page 16: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

base tim

e

Anatomy of a chunk [v0]

5 bytes head

er

base value

value tim

e

valuetim

e

valuetim

e

valuetim

e ... (one per timestamp)

... (one per value)

1000000 1441558420098

1001050 1441558432221

1002040 1441558444311

10020401441558444311

10000001441558420098

10010501441558432221

Page 17: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

base tim

e

Anatomy of a chunk [v1]

5 bytes head

er

base value

ᶶ value

ᶶ tim

e

ᶶ value

ᶶ tim

e

ᶶ value

ᶶ tim

e

ᶶ value

ᶶ tim

e

... (one per timestamp)

... (one per value)

1000000 1441558420098

1050 12123

2040 24213

3100 36313

4250 48500

10020401441558444311

10000001441558420098

10010501441558432221

+12123

+1050

+12090

+1000

Page 18: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

base tim

e

Anatomy of a chunk [v2]

5 bytes head

er

base value

ᶶ value

ᶶ tim

e

ᶶᶶ

valueᶶᶶ

time

ᶶᶶ

valueᶶᶶ

time

ᶶᶶ

valueᶶᶶ

time

... (one per timestamp)

... (one per value)

10020401441558444311

10000001441558420098

10010501441558432221

1000000 1441558420098

1050 12123

-60 -33

-50 -56

+50 -8+12123

+1050

+12090

+1000

Page 19: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

base tim

e

Anatomy of a chunk [v2]

5 bytes head

er

base value

ᶶ value

ᶶ tim

e

ᶶᶶ

valueᶶᶶ

time

ᶶᶶ

valueᶶᶶ

time

ᶶᶶ

valueᶶᶶ

time

... (one per timestamp)

... (one per value)

13:14 < nostrovsk> Hey guys, Looking for a sanity check here13:15 < nostrovsk> 500 machines per server, each running node and jmx exporters, for 1 week is only 30gb of data?13:36 <@ bbrazil> what's your scrape rate and how heavy are those jmx exporters?13:37 <@ bbrazil> doesn't sound implausible to me13:42 <@ bbrazil> we're 25GB/two weeks with ~5k samples/s13:45 <@ beorn7> Compression, it works... ;)13:53 < fish_> beorn7: nothing says better 'good job' than people coming to this channel because they can't believe that things are soo good :)

Page 20: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec
Page 21: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

rate(prometheus_local_storage_ingested_samples_total[1m])

Page 22: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec

CheckpointingOn shutdown and regularly to limit data loss in case of a crash.

memory

disk

checkpoint file

Page 23: Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample resolution 64bit timestamp + 64bit value Requirements 100,000 samples/sec