Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on...
-
Upload
octavia-stewart -
Category
Documents
-
view
224 -
download
0
Transcript of Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on...
![Page 1: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/1.jpg)
Adaptive Overload Control for Busy Internet Servers
Matt Welsh and David CullerUSENIX Symposium on Internet Technologies and Systems (USITS)
2003
Alex CheungNov 13, 2006
ECE1747
![Page 2: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/2.jpg)
Outline• Motivation
• Goal
• Methodology• Detection• Overload control
• Experiments
• Comments
![Page 3: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/3.jpg)
Motivation1. Internet services becoming important to
our daily lives:• Email• News• Trading
2. Services becoming more complex• Large dynamic content
• Requires high computation and I/O• Hard to predict load requirements of requests
3. Withstand peak load that is 1000x the norm without over-provisioning
• Solve CNN’s problem on 911
![Page 4: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/4.jpg)
Goal• Adaptive overload control scheme at
node level by maintaining:• Response time• Throughput• QoS & Availability
![Page 5: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/5.jpg)
Methodology - Detection
1. Look at the 90th percentile response time2. Compare with threshold and decide what to do
Weaker alternatives:• 100th percentile: does not capture “shape” of response time
curve• Throughput: does not capture user perceived performance
of the system
I ask:• What makes 90th percentile so great?• Why not 95th? 80th? 70th?• No supporting micro-experiment
1 2 3 4 5 6 7 8 9 10
Requests served
Examine 90th highest response time
![Page 6: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/6.jpg)
Methodology – Overload Control• If response time is higher than
threshold:1. Limit service rate by rejecting selected
requests• Extension: Differentiate requests with
classes/priorities levels and reject lower class/priority requests first
2. Quality/service degradation3. Back pressure
1. Queue explosion at 1st stage (they say)• Solved by rejecting requests at 1st stage
2. Breaks the loose-coupling modular design of SEDA with out-of-band notification scheme (I say)
![Page 7: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/7.jpg)
Methodology – Overload Control4. Forward rejected request to another
“more available” server.• “more available” – server with the most of a
particular resource:• CPU, network, I/O, hard disk
• Make decision using centralized or distributed algorithm
• Reliable state migration, possibly transactional
My take:• More complex, interesting, and actually
solves CNN’s problem with a cluster of servers!
![Page 8: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/8.jpg)
Rate Limit
SMOOTHED
Multiplicative decrease Additive increase
Just like TCP!
10 fine-tuned parameters per stage.
![Page 9: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/9.jpg)
Rate Limit With Class/Priority
Class/priority assignment based on:• IP address, header information, HTTP cookies
I ask:• Where is the priority assignment module implemented?• Should priority assignment be a stage of its own?• Is it not shown because complicates the diagram and makes the
stage design not “clean”?• How to classify which requests are potentially “bottleneck”
requests? Application dependent?
![Page 10: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/10.jpg)
Quality/Service Degradation• Notify application via signal to DO service
degradation.• Application does service degradation, not
SEDA
Questions:• How is the signaling implemented?
• Out of band? • Is it possible to signal previous stages in
the pipeline? Will this SEDA’s loose-coupling design?
signal
Attach image
Send response
![Page 11: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/11.jpg)
Experiments
![Page 12: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/12.jpg)
Experiments - Setup
• Arashi email server (realistic experiment)• Real access workload• Real email content• Admission control
• Web server benchmark• Service degradation + 1-class admission
control
![Page 13: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/13.jpg)
Experiments – Admission Rate
Controller response time is not as fast.
Additive increaseMultiplicative decrease
![Page 14: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/14.jpg)
Experiments – Response Time
Why?
Why?
![Page 15: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/15.jpg)
Experiments – Massive Load Spike
Not fair! SEDA’s parameters were fine-tuned. Apache can be tuned to stay flat too.
![Page 16: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/16.jpg)
Experiments – Service Degradation
Service degradation and admission control kick in at roughly the same time
![Page 17: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/17.jpg)
Experiments – Service Differentiation
• Average reject rates without service differentiation:• Low-priority: 55.5%• High-priority: 57.6%
• With service differentiation:• Low-priority: 87.9% +32.4%• High-priority: 48.8% -8.8%
Question:• Why is the drop rate for high priority
request reduced so little with service differentiation? Workload dependent?
![Page 18: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/18.jpg)
Comments
![Page 19: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/19.jpg)
Comments• No idea on what is the controller’s
overhead• Overload control at node level is not good:
• Node level is inefficient• Late rejection
• Node level is not user friendly:• All session state is gone if you get a reject out of the
blues ← comes without warning
• Need global level overload control scheme
• Idea/concept is explained in 2.5 pages
![Page 20: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/20.jpg)
Comments• Rejected requests:
• Instead of TCP timeout, send static page.• (Paper says) this is better• (I say) This is worst because it leads to a out-of-
memory crash down the road:• Saturated output bandwidth• Boundless queue at reject handler
• Parameters:• How to tune them? How difficult to tune?• May be tedious tuning each stage manually.• Given a 1M stage application, need to
configure all 1M stage thresholds manually?• Automated tuning with control theory?
• Methodology of adding extensions is not shown in any figures.
![Page 21: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/21.jpg)
Comments• Experiment is not entirely realistic:
• Inter-request think time is 20ms• realistic?
• Rejected users have to re-login after 5 min:
• All state information is gone• Frustrated users
• Two drawbacks of using response time for load detection…
![Page 22: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/22.jpg)
Comments
1. No idea which resource is the bottleneck: CPU? I/O? Network? Memory?• SEDA can only either:
• Do admission control • Reduces throughput
• Tell application to degrade overall service
![Page 23: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/23.jpg)
Comments
CPU I/O Network Memory
Res
ourc
e ut
iliza
tion
threshold
Default admission control:
Attach image
Send response
Reject requests
OVERLOADED!!!
… and piss off some users.
![Page 24: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/24.jpg)
Comments
CPU I/O Network Memory
Res
ourc
e ut
iliza
tion
threshold
Service degradation WITH bottleneck intelligence:
Network is the bottleneck, so expend some CPU and memory to reduce fidelity and size of images to reduce bandwidth consumption WITHOUT reducing admission rate.
![Page 25: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/25.jpg)
Comments2. The response time index is lagging by at
least the magnitude of the response time• 50 requests come in all at once• nreq = 100• timeout = 10s• target = 20s• Processing time per request = 1s
• Detects overload after 30s
Solution:• Compare enqueue VS dequeue rate• Overload occurs when enqueue rate >
dequeue rate• Detects overload after 10s
![Page 26: Adaptive Overload Control for Busy Internet Servers Matt Welsh and David Culler USENIX Symposium on Internet Technologies and Systems (USITS) 2003 Alex.](https://reader030.fdocuments.in/reader030/viewer/2022033106/56649e665503460f94b60dae/html5/thumbnails/26.jpg)
Questions?