Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample...
Transcript of Time Series in Prometheus - files-cdn.cnblogs.com · 1 million time series 10 second sample...
Time Series in PrometheusFabian Reinartz – Engineer, SoundCloud Ltd.
prometheus.io
...http_requests_total{status="200",method="GET"} @1434317560938 94355http_requests_total{status="200",method="GET"} @1434317561287 94934http_requests_total{status="200",method="GET"} @1434317562344 96483http_requests_total{status="404",method="GET"} @1434317560938 38473http_requests_total{status="404",method="GET"} @1434317561249 38544http_requests_total{status="404",method="GET"} @1434317562588 38663http_requests_total{status="200",method="POST"} @1434317560885 4748http_requests_total{status="200",method="POST"} @1434317561483 4795http_requests_total{status="200",method="POST"} @1434317562589 4833http_requests_total{status="404",method="POST"} @1434317560939 122...
Prometheus Metrics
Metric name Labels Timestamp Sample Value
● 1 million time series● 10 second sample resolution● 64bit timestamp + 64bit value
Requirements
100,000 samples/sec
time [~weeks]
series[~millions]
Writes
The Fundamental ProblemOrthogonal write and read patterns.
Reads
...http_requests_total{status="200",method="GET"} @1434317560938 94355http_requests_total{status="200",method="GET"} @1434317561287 94934http_requests_total{status="200",method="GET"} @1434317562344 96483http_requests_total{status="404",method="GET"} @1434317560938 38473http_requests_total{status="404",method="GET"} @1434317561249 38544http_requests_total{status="404",method="GET"} @1434317562588 38663http_requests_total{status="200",method="POST"} @1434317560885 4748http_requests_total{status="200",method="POST"} @1434317561483 4795http_requests_total{status="200",method="POST"} @1434317562589 4833http_requests_total{status="404",method="POST"} @1434317560939 122...
Prometheus MetricsKey-Value store (with BigTable semantics) seems suitable.
Metric name Labels Timestamp Sample Value
VALUEKEY
Ingestion
PromQL
Storagein-memory data
append(series, time, value)
series iteratorsHDD / SSD
LevelDBEncode
Decode
Compress Decompress
http_requests_total{status="200",method="GET"}
Prometheus Metrics
Metric name Labels
{__name__="http_requests_total",status="200",method="GET"}
Labels
Prometheus MetricsLearning the hard way
__name__ = http_requests_totalstatus = 200method = GET
fnv(sort( )
fnv( __name__ = http_requests_total )fnv( status = 200 )fnv( method = GET )
⊕⊕
1KB chunks
chunk in memory[complete and immutable]
head chunk[incomplete]
SampleIngestion
append(series, time, value)
memory
disk
evictable chunks (LRU)
chunk on disk[complete and immutable]
PromQLseries iterator
one file per time series
series hash:
Series maintenance
memory
disk
older than retention time
Chunk preloading
memory
disk
PromQL
series iterator
base tim
e
Anatomy of a chunk [v0]
5 bytes head
er
base value
value tim
e
valuetim
e
valuetim
e
valuetim
e ... (one per timestamp)
... (one per value)
1000000 1441558420098
1001050 1441558432221
1002040 1441558444311
10020401441558444311
10000001441558420098
10010501441558432221
base tim
e
Anatomy of a chunk [v1]
5 bytes head
er
base value
ᶶ value
ᶶ tim
e
ᶶ value
ᶶ tim
e
ᶶ value
ᶶ tim
e
ᶶ value
ᶶ tim
e
... (one per timestamp)
... (one per value)
1000000 1441558420098
1050 12123
2040 24213
3100 36313
4250 48500
10020401441558444311
10000001441558420098
10010501441558432221
+12123
+1050
+12090
+1000
base tim
e
Anatomy of a chunk [v2]
5 bytes head
er
base value
ᶶ value
ᶶ tim
e
ᶶᶶ
valueᶶᶶ
time
ᶶᶶ
valueᶶᶶ
time
ᶶᶶ
valueᶶᶶ
time
... (one per timestamp)
... (one per value)
10020401441558444311
10000001441558420098
10010501441558432221
1000000 1441558420098
1050 12123
-60 -33
-50 -56
+50 -8+12123
+1050
+12090
+1000
base tim
e
Anatomy of a chunk [v2]
5 bytes head
er
base value
ᶶ value
ᶶ tim
e
ᶶᶶ
valueᶶᶶ
time
ᶶᶶ
valueᶶᶶ
time
ᶶᶶ
valueᶶᶶ
time
... (one per timestamp)
... (one per value)
13:14 < nostrovsk> Hey guys, Looking for a sanity check here13:15 < nostrovsk> 500 machines per server, each running node and jmx exporters, for 1 week is only 30gb of data?13:36 <@ bbrazil> what's your scrape rate and how heavy are those jmx exporters?13:37 <@ bbrazil> doesn't sound implausible to me13:42 <@ bbrazil> we're 25GB/two weeks with ~5k samples/s13:45 <@ beorn7> Compression, it works... ;)13:53 < fish_> beorn7: nothing says better 'good job' than people coming to this channel because they can't believe that things are soo good :)
rate(prometheus_local_storage_ingested_samples_total[1m])
CheckpointingOn shutdown and regularly to limit data loss in case of a crash.
memory
disk
checkpoint file