The Mystery Machine: End-to-end Performance Analysis of Large ...
Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale...
-
Upload
matthew-howett -
Category
Documents
-
view
227 -
download
0
Transcript of Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale...
![Page 1: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/1.jpg)
Trace Analysis
Chunxu Tang
![Page 2: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/2.jpg)
The Mystery Machine: End-to-end performance analysis of large-scale Internet services
![Page 3: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/3.jpg)
Introduction
• Complexity comes from• Scale• Heterogeneity
![Page 4: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/4.jpg)
Introduction (Cont.)
• End-to-end:• From a user initiates a page load in a client Web browser,• Through server-side processing, network transmission, and JavaScript
execution,• To the point client Web browser finishes rendering the page.
![Page 5: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/5.jpg)
Introduction (Cont.)
• UberTrace• End-to-end request tracing
• Mystery Machine• Analysis framework
![Page 6: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/6.jpg)
UberTrace
• Unify the individual logging systems at Facebook into a single end-to-end performance tracing tool, dubbed UberTrace.
![Page 7: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/7.jpg)
UberTrace (Cont.)
• Log messages contain at least:• 1. A unique request identifier.• 2. The executing computer.• 3. A timestamp that uses the local clock of the executing computer.• 4. An event name.• 5. A task name, where a task is defined to be a distributed thread
of control.
![Page 8: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/8.jpg)
The Mystery Machine
• Procedure:• Create a causal model• Find the critical path• Quantify slack for segments not on the critical path• Identify segments that are correlated with performance anomalies.
![Page 9: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/9.jpg)
Causal Relationships Model
• Happens-before (->)• Mutual exclusion (˅)• Pipeline (>>)
![Page 10: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/10.jpg)
Algorithms
• 1. Generate all possible hypotheses for causal relationships among segments. • The execution interval between two consecutive logged events for the
same task.
• 2. Iterate through traces and rejects a hypothesis if it finds a counterexample in any trace.
![Page 11: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/11.jpg)
Algorithms (Cont.)
![Page 12: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/12.jpg)
Analysis
• Critical path analysis• The critical path is defined to be the set of segments for which a differential
increase in segment execution time would result in the same differential increase in end-to-end latency.
![Page 13: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/13.jpg)
Analysis (Cont.)
![Page 14: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/14.jpg)
Analysis (Cont.)
• Slack Analysis• Slack is the amount by which the duration of a segment may increase without
increasing the end-to-end latency of the request, assuming that the duration of all other segments remains constant.
![Page 15: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/15.jpg)
Implementation
![Page 16: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/16.jpg)
Results
![Page 17: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/17.jpg)
Results (Cont.)
![Page 18: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/18.jpg)
Results (Cont.)
![Page 19: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/19.jpg)
Towards General-Purpose Resource Management in Shared Cloud Services
![Page 20: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/20.jpg)
Introduction
• Challenges of resource management• Bottleneck on hardware or software• Ambiguous which user is responsible for system load• Tenants interfere with internal system tasks• Resource requirements vary• Unpredictable which machine execute a request and how long
• Goals• Effective• Efficient
![Page 21: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/21.jpg)
Resource Management Design Principles• Observation: Multiple request
types can contend on unexpected resources.• Principles: Consider all request
types and all resources in the system.
![Page 22: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/22.jpg)
Resource Management Design Principles (Cont.)
• Observation: Contention may be caused by only a subset of tenants.• Principle: Distinguish
between tenants.
![Page 23: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/23.jpg)
Resource Management Design Principles (Cont.)• Observation: Foreground requests are only part of the story.• Principle: Treat foreground and background tasks uniformly.
![Page 24: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/24.jpg)
Resource Management Design Principles (Cont.)
• Observation: Resource demands are very hard to predict.
• Principle: Estimate resource usage at runtime.
![Page 25: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/25.jpg)
Resource Management Design Principles (Cont.)
• Observation: Requests can be long or lose importance.
• Principle: Schedule early, schedule often.
![Page 26: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/26.jpg)
Retro Instrumentation Platform
• Tenant abstraction• End-to-End ID Propagation• Automatic Resource
Instrumentation using AspectJ• Aggregation and Reporting• Entry and Throttling Points
![Page 27: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/27.jpg)
Evaluation on HDFS
![Page 28: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/28.jpg)
IntroPerf: Transparent Context-Sensitive Multi-Layer Performance Inference using System Stack Traces
![Page 29: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/29.jpg)
Introduction
• Functionality:• With system stack traces as input, IntroPerf transparently infers context-
sensitive performance data of the software by measuring the continuity of calling context – the continuous period of a function in a stack with the same calling context.
![Page 30: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/30.jpg)
Introduction (Cont.)
![Page 31: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/31.jpg)
Introduction (Cont.)
• Contributions:• Transparent inference of function latency in multiple layers based on stack
traces.• Automated localization of internal and external performance bottlenecks via
context-sensitive performance analysis across multiple system layers.
![Page 32: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/32.jpg)
Design of IntroPerf
• RQ1: • Collection of traces using a widely deployed common tracing framework.
• RQ2:• Application performance analysis at the fine-grained function level with
calling context information.
• RQ3:• Reasonable coverage of program execution captured by system stack traces
for performance debugging.
![Page 33: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/33.jpg)
Architecture
![Page 34: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/34.jpg)
Inference of Function Latencies
• Conservative estimation:• Estimates the end of a function with the last event of the context
• Aggressive estimation:• Estimates the end with the start event of a distinct context.
![Page 35: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/35.jpg)
Inference of Function Latencies (Cont.)
![Page 36: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/36.jpg)
Context-sensitive analysis of inferred performance
• Top-down latency normalization
• Performance-annotated calling context ranking
![Page 37: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/37.jpg)
Evaluation
![Page 38: Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.](https://reader036.fdocuments.in/reader036/viewer/2022081504/56649cac5503460f9496dded/html5/thumbnails/38.jpg)
Summary of the papers
• http://joshuatang.github.io/timeline/papers.html