Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng...
Transcript of Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng...
![Page 1: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/1.jpg)
Data Pipeline Monitoring
Michiel Kalkman
![Page 2: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/2.jpg)
Mental model of a pipeline
Figure 1: Actually a duct (Source:wikimedia)
![Page 3: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/3.jpg)
Map of a real pipeline
Figure 2: Typical pipeline (Source:wikimedia)
![Page 4: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/4.jpg)
Notes
Pipelines,▶ are systems▶ cross multiple political zones▶ cross multiple technical zones▶ have multiple inputs (providers, sources)▶ have multiple outputs (consumers, sinks)▶ carry payloads in multiple stages (refinements)
![Page 5: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/5.jpg)
Break it down
By administrative zones
Defines supportability, frames arguments over responsibility
![Page 6: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/6.jpg)
Observability
![Page 7: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/7.jpg)
Pillars of Observability
Logs Metrics TracingAccounting X XReporting X XAlerting X XTesting X X XDiagnostics X X XVerification X XAuditing X
![Page 8: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/8.jpg)
What gets measured gets managed
Products
Data
Pipeline
Repor�ngMonitoringDiagnos�csAudi�ng Aler�ng Accoun�ng
MetricsLogs Tracing
Pipeline component
Figure 3: Component observability
![Page 9: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/9.jpg)
What to monitor
▶ Data flowing across platform boundaries▶ Cycles in the pipeline▶ Data flow pressure points▶ Baseline operation separate from service operation▶ Infrastructure separate from service operation▶ Quality control gateways for change
![Page 10: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/10.jpg)
Feed
Products
Observability Data
Asset
Repor�ng Monitoring Diagnos�cs Aler�ng
Metrics
Pipeline component
Figure 4: Observability - Metrics
![Page 11: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/11.jpg)
Metrics focus
A wide variety of metrics out there. It’s easy to get lost. Define high level metrics thatcan be compared consistenty across the entire landscape. Focus on two distinct areas.Different sides of the same coin,
Utilization, Saturation, Errors (USE)
These are resource focused and provide technical information▶ “Which servers are overloaded?”
Rate, Errors, Duration (RED)
These are service focused and provide business information▶ “Am I meeting my SLA targets?”
![Page 12: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/12.jpg)
Four Golden Signals (Google SRE)
1. Latency2. Traffic3. Errors4. Saturation
![Page 13: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/13.jpg)
Throughput - components
DownstreamInput
ForwarderOutput
Upstream
Count bytesCount events
Count bytesCount events
Figure 5: Component metrics
![Page 14: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/14.jpg)
Throughput
Counter t1 t2input bytes 100 200output bytes 150 270input events 20 30output events 30 55
▶ Throughput Rate is (𝑡2 − 𝑡1)▶ Average event size
▶ 𝐼𝑛 = 𝑅𝑎𝑡𝑒(𝐵𝑦𝑡𝑒𝑠𝐼𝑛)𝑅𝑎𝑡𝑒(𝐸𝑣𝑒𝑛𝑡𝑠𝐼𝑛)
▶ 𝑂𝑢𝑡 = 𝑅𝑎𝑡𝑒(𝐵𝑦𝑡𝑒𝑠𝑂𝑢𝑡)𝑅𝑎𝑡𝑒(𝐸𝑣𝑒𝑛𝑡𝑠𝑂𝑢𝑡)
▶ Internal buffer pressure▶ 𝑅𝑎𝑡𝑒(𝐸𝑣𝑒𝑛𝑡𝑠𝐼𝑛) − 𝑅𝑎𝑡𝑒(𝐸𝑣𝑒𝑛𝑡𝑠𝑂𝑢𝑡)
![Page 15: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/15.jpg)
Tracing
![Page 16: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/16.jpg)
Feed
Products
Data
Pipeline
Repor�ng Monitoring Diagnos�cs Aler�ng Accoun�ng
Tracing
Pipeline component
Figure 6: Observability - Tracing
![Page 17: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/17.jpg)
Three Ts
Inputs Outputs TransformationTransaction 1 1 New dataTransportation 1 1+ EnrichmentTransformation 1+ 1+ New data, enrichment
![Page 18: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/18.jpg)
Transportation tracing
Downstream
Forwarder
Upstream
Figure 7: Transportation
![Page 19: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/19.jpg)
Transaction tracing
User
Component A Component B
Figure 8: Distributed transaction
![Page 20: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/20.jpg)
Transformation tracing
Source A Source B
Transformer
Upstream Target
Figure 9: Transformation
![Page 21: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/21.jpg)
Monitoring
![Page 22: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/22.jpg)
Plan for failure
Figure 10: Hopefully not this bad (Source:wikimedia)
![Page 23: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/23.jpg)
Key monitoring points
▶ Integrity▶ Packet/event/record drops▶ Timeouts, queue expiries▶ Data loss scenarios
▶ Capacity▶ Backpressure signaling▶ Backlog processing▶ Peak hour spikes
![Page 24: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/24.jpg)
Heartbeats
▶ Add a dummy input channel to each input▶ Continuously generate fixed data at fixed rate▶ Monitor dummy channel on each boundary▶ Alert on dummy channel rate at each boundary
![Page 25: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/25.jpg)
Buffers, Backlogs and Backpressure
![Page 26: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/26.jpg)
MQ pipeline with push - dataflow
Topic A
Topic B
Topic C
Topic D
Transform
Transform
Transform
Producer MQ Handler 1 Handler 2 Handler 3 Consumer
Figure 11: MQ pipeline with push - dataflow
![Page 27: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/27.jpg)
MQ pipeline with push - sequenceProducer
Producer
MQ
MQ
Handler 1
Handler 1
Handler 2
Handler 2
Handler 3
Handler 3
Consumer
Consumer
PUSH Topic A
PUSH Topic A
Pressure point
Process
PUSH Topic B
PUSH Topic B
Pressure point
Process
PUSH Topic C
PUSH Topic C
Pressure point
Process
PUSH Topic D
PUSH Topic D
Pressure point
Figure 12: Kafka pipeline with push - sequence
![Page 28: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/28.jpg)
MQ pipeline notes
▶ This design is active here, sends data as it comes in▶ Server-push model for moving data
▶ Yes, you can also poll a queue▶ Complex programming model
▶ MQ-specific protocol▶ Requires registration of callback▶ Handler process might be unavailable
![Page 29: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/29.jpg)
Model
def next(records_in, buffer_size, output_capacity):buffer_size = buffer_size + records_in
if ((buffer_size - output_capacity) >= 0):records_out = output_capacitybuffer_size = buffer_size - output_capacity
else:records_out = buffer_sizebuffer_size = 0
plot(records_in, buffer_size, records_out)return buffer_size
![Page 30: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/30.jpg)
Input rate =< output capacity
Figure 13: Output capacity = 15 eps
![Page 31: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/31.jpg)
Backlog processing
Figure 14: Output capacity = 5 eps
![Page 32: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/32.jpg)
Backlog processing with finite buffer
Figure 15: Limit reached with no backpressure means data loss
![Page 33: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/33.jpg)
Observing buffer change rate
t1 t2 t3 t4𝐶𝑜𝑢𝑛𝑡𝑒𝑟(𝐼𝑛) 5 12 19 26𝐶𝑜𝑢𝑛𝑡𝑒𝑟(𝑂𝑢𝑡) 5 10 15 20𝑅𝑎𝑡𝑒𝐼𝑛(𝑡) N/A 7 7 7𝑅𝑎𝑡𝑒𝑂𝑢𝑡(𝑡) N/A 5 5 5𝑅𝑎𝑡𝑒𝐼𝑛(𝑡) − 𝑅𝑎𝑡𝑒𝑂𝑢𝑡(𝑡) N/A 2 2 2
𝑅𝑎𝑡𝑒(𝑛) = 𝐶𝑜𝑢𝑛𝑡𝑒𝑟(𝑛) − 𝐶𝑜𝑢𝑛𝑡𝑒𝑟(𝑛 − 1) 𝐵𝑢𝑓𝑓𝑒𝑟(𝑛) = 𝑅𝑎𝑡𝑒𝐼𝑛(𝑛) − 𝑅𝑎𝑡𝑒𝑂𝑢𝑡(𝑛)
![Page 34: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/34.jpg)
Buffer change rate
Figure 16: Long term average of the red line should approach zero
![Page 35: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/35.jpg)
Kafka pipeline - Dataflow - by asset
Schema A Topic A
Schema B Topic B
Schema C Topic C
Schema D Topic D
Transform
Transform
Transform
Producer Ka�a Spark 1 Spark 2 Spark 3 Consumer
Figure 17: Kafka pipeline Dataflow
![Page 36: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/36.jpg)
Kafka pipeline - connection initiation - by asset
Producer
Producer
Ka�a
Ka�a
Spark 1
Spark 1
Spark 2
Spark 2
Spark 3
Spark 3
Consumer
Consumer
PUSH Topic A
PULL Topic A
Process
PUSH Topic B
PULL Topic B
Process
PUSH Topic C
PULL Topic C
Process
PUSH Topic D
PULL Topic D
Figure 18: Kafka pipeline sequence
![Page 37: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/37.jpg)
Kafka pipeline - Dataflow - by service
Topic A
Transform
Topic B
Transform
Topic C
Transform
Topic D
Producer Ka�aTopic A
Spark 1 Ka�aTopic B
Spark 2 Ka�aTopic C
Spark 3 Ka�aTopic D
Consumer
Figure 19: Kafka pipeline Dataflow
![Page 38: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/38.jpg)
Kafka pipeline - connection initiation - by serviceKa�a Ka�a Ka�a Ka�a
Producer
Producer
Topic A
Topic A
Spark 1
Spark 1
Topic B
Topic B
Spark 2
Spark 2
Topic C
Topic C
Spark 3
Spark 3
Topic D
Topic D
Consumer
Consumer
PUSH
PULL
Process
PUSH
PULL
Process
PUSH
PULL
Process
PUSH
PULL
Figure 20: Kafka pipeline sequence
![Page 39: Data Pipeline Monitoring - Michiel Kalkman · Data Pipeline Reporng Monitoring Diagnoscs Alerng Accounng Tracing Pipeline component Figure 6: Observability - Tracing. Three Ts Inputs](https://reader035.fdocuments.in/reader035/viewer/2022062606/5fde0d156d7e6565370e2a2a/html5/thumbnails/39.jpg)
Kafka pipeline notes
▶ This design is passive, does not send data unless asked▶ Client-pull model for moving data▶ All persistence is done on Kafka▶ Very simple programming model▶ Well understood wire-protocol (HTTP)