Otimizando servidores web
-
Upload
amazon-web-services-latin-america -
Category
Internet
-
view
391 -
download
1
Transcript of Otimizando servidores web
SÃO PAULO
Otimizando Servidores Webe seus componentes
Davi Menezes & Robert Fuente
Cloud Technical Account Manager | AWS Support
Different strategies for better performance
• Leverage newer hardware and software.
• Apply more resources through auto scaling.
• Offload the heavy lifting to someone else.
• Optimize the web server stack.
Defining “better” performance
• Throughput -- transactions per second (tps).
• Latency reduction.
• Cost reduction.
Optimizations by definition are app-specific
• Test and validate together with the application itself.
• There is no substitute to production data.
• Make it an integral part of the application itself.– E.g. Elastic Beanstalk .ebextensions
Identifying Bottlenecks
First understand your workload
• What are we serving?– Number of transactions
– Transaction size
– Back-end resource consumption
• How much can we do today?– Theoretical benchmark
– Actual production load (observability / data-driven)
• What is the bottleneck resource?– “Choose instance type for the bounding resource”
– Workload Analysis vs. Resource Analysis
https://youtu.be/7Cyd22kOqWc
Avoid tuning finds at random
Logs: the ultimate source of truth119.246.177.166 - - [02/Nov/2014:05:02:00 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-"
117.21.173.27 - - [02/Nov/2014:06:28:39 +0000] "GET /manager/html HTTP/1.1" 404 289 "-"
117.21.225.165 - - [02/Nov/2014:16:36:58 +0000] "GET /manager/html HTTP/1.1" 404 289 "-"
50.62.6.117 - - [02/Nov/2014:20:50:39 +0000] "GET //wp-login.php HTTP/1.1" 404 289 "-"
50.62.6.117 - - [02/Nov/2014:20:50:39 +0000] "GET /blog//wp-login.php HTTP/1.1" 404 295 "-"
50.62.6.117 - - [02/Nov/2014:20:50:40 +0000] "GET /wordpress//wp-login.php HTTP/1.1" 404 300 "-"
50.62.6.117 - - [02/Nov/2014:20:50:40 +0000] "GET /wp//wp-login.php HTTP/1.1" 404 293 "-"
24.199.131.50 - - [03/Nov/2014:08:00:30 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-"
76.10.82.137 - - [03/Nov/2014:08:55:49 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-"
123.249.19.23 - - [03/Nov/2014:09:15:29 +0000] "GET /manager/html HTTP/1.1" 404 289 "-"
117.21.173.27 - - [03/Nov/2014:15:55:25 +0000] "GET /manager/html HTTP/1.1" 404 289 "-"
62.210.136.228 - - [03/Nov/2014:22:31:22 +0000] "GET / HTTP/1.1" 403 3839 "-"
24.27.104.175 - - [04/Nov/2014:00:18:18 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-"
198.20.69.74 - - [04/Nov/2014:02:07:05 +0000] "GET / HTTP/1.1" 403 3839 "-"
198.20.69.74 - - [04/Nov/2014:02:07:13 +0000] "GET /robots.txt HTTP/1.1" 404 287 "-”
181.188.47.118 - - [04/Nov/2014:03:02:56 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-"
117.21.173.27 - - [04/Nov/2014:09:27:19 +0000] "GET /manager/html HTTP/1.1" 404 289 "-"
193.174.89.19 - - [04/Nov/2014:13:34:23 +0000] "GET / HTTP/1.1" 403 3839 "-"
CloudWatch Metric Anatomy
• Statistical aggregation
– Min
– Max
– Sum
– Average
– Count
• One data point per minute.
• Can trigger actions via
alarms.
Micro metrics vs. Macro metrics
• Agent-based monitoring
• Available in
Amazon Linux
• Provides highly-granular,
server-specific insights
Source: http://demo.munin-monitoring.org/
Coming from a variety of sources
Customer generated
• Kernel and Operating System
• Web Server
• Application Server/Middleware
• Application code
• Instance networking
AWS generated
• Amazon CloudFront
• Amazon Elastic Load Balancing
• Amazon CloudWatch
• Amazon Simple Storage Service
0
50
100
150
200
250
1 6 111621263136414651566166717681869196
Latency at percentile Average Latency
0
200
400
600
800
1000
1200
1400
1600
1800
2000
6 9
12
15
18
21
24
27
30
33
36
39
42
45
48
55
20
4
20
7
21
0
Latency Histogram
Frequency
More than meet the eyes
Noteworthy AWS CloudWatch metrics
• EC2 Instances
– New T2 CPU Credits
– CPU utilization
– Bandwidth (In/Out)
• EBS
– PIOPS utilization
– GP2 utilization
– Remember: 8GB volumewill provision 24 IOPs!
• Elastic Load Balancing
– RequestCount
– Latency
– Queue length and spillover
– Backend connections errors
• CloudFront
– Requests
– BytesDownloaded
Diving Deep on the Last Mile (you & us)
Elastic Load Balancer
ELB Connection Behavior
• No true limits on influx of connections
– But may require preemptive scaling (aka Pre-warming)
• Leverages HTTP Keep-Alives
• Configurable Idle Connection Timeout
• HTTP Session Stickness & Health-checking
– Fast Registration
• SSL Off-loading and Back-end authentication
ELB access logs
HTTP log entries
• Only one side of picture.
• Can’t log custom headers or
format logs.
• Logs are delayed.
• Drill down to instance level
responsiveness, but can’t dive
into latency outliers
0
5
10
15
20
25
30
35
Processing Time
response_processing_time
request_processing_time
backend_processing_time
byte
s
ELB Key Metrics
• Latency and Request Count
• Surge Queue and Spillover
• ELB 5xx and 4xx
• Back-end Connection Errors
• Healthy and Unhealthy Host Counts
The life of an HTTP connection
int cfd,fd=socket(PF_INET,SOCK_STREAM,IPPROTO_TCP);
struct sockaddr_in si;
si.sin_family=PF_INET;
inet_aton("127.0.0.1",&si.sin_addr);
si.sin_port=htons(80);
bind(fd,(struct sockaddr*)si,sizeof si);
listen(fd,512);
while ((cfd=accept(fd,(struct sockaddr*)si,sizeof si)) != -1) {
read_request(cfd); /* read(cfd,...) until "\r\n\r\n" */
write(cfd,"200 OK HTTP/1.0\r\n\r\n"
”Bem-vindo ao AWS Summit SP 2015.",19+27);
close(cfd);
}
http:80fd=socket(PF_INET,SOCK_STREAM,IPPROTO_TCP)
bind(fd,(struct sockaddr*)si,sizeof si)
listen(fd,512)
accept(fd,(struct sockaddr*)si,sizeof si)
# of open
file descriptors
The last TCP mile
• Accept Pending Queue– man listen(2): “(…) backlog argument defines the maximum length to which the
queue of pending connections for sockfd may grow.”
– Recv-Q & Send-Q – TCP is stream oriented
• man accept(2): Blocking vs. Non-blocking sockets
Tweaking the TCP stack (aka sysctl)
Queuing at the TCP layer first
• ECONNREFUSED
man listen(2):
“if the underlying protocol supports
retransmission, the request may be ignored
so that a later reattempt at connection
succeeds” – aka: TCP Retransmit
Scaling in the Linux Networking Stack
• Connection States– man netstat(8)
• Backlog Maximum Length– Waiting to be accepted: /proc/sys/net/core/somaxconnn
– Half-Open connections: /proc/sys/net/ipv4/tcp_max_syn_backlog
– CPU's input packet queue: /proc/sys/net/core/netdev_max_backlog
TCP is a Window based protocol
• TCP Receive Window
“considered one of the most important TCP tweaks” (ugh!)
– BDP = avail. bandwidth (KBps) X RTT (ms)
• Choose an EC2 Instance
with proper Bandwidth
TCP Initial Congestion Window
• RFC 3390 – Higher Initial Window
– ip route (…) initcwnd 10 (kernel <2.6.39)
• Disable Slow Start (net.ipv4.tcp_slow_start_after_idle)
• Google Research
– “propose to increase (…) to at least ten segments (about 15KB)
Pub: “An Argument for Increasing TCP's Initial Congestion Window”
+/* TCP initial congestion window */
+#define TCP_INIT_CWND 10http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=(…)
commited to the
kernel 2.6.39 (May 2011)
TCP Buffers & Memory Utilization
• Buffering– Use case: sending/receiving large amounts of data
– Auto-tunable by the kernel
– However, has bounds: min, default, and max.
– Tune: net.ipv4.tcp_rmem/wmem (in bytes)
• Sockets demand on page allocation– Tune: net.ipv4.tcp_mem (in pages)
inet_timewait_death_row
About TIME-WAIT state
• TIME-WAIT Assassination RFC
• Increase your port range– net.ipv4.ip_local_port_range
– A ballpark of your rate of connections per second:
(ip_local_port_range / tcp_fin_timeout)
leads to about 500 connections per second !
“The TIME_WAIT state is our friend and is there to help us (i.e., to let old
duplicate segments expire in the network). Instead of trying to avoid the state,
we should understand it.”Vincent Bernat - (vincent.bernat.im)
Check your sources
XKCD: Duty Call - https://xkcd.com/386/
• Clients behind NAT/Stateful FW
• will get dropped
*99.99999999% of time
should never be enabled
* Probably 100% but there may be a valid case out there
TL;DR: Do *not* enable net.ipv4.tcp_tw_recycle
Linux’s TCP protocol man page
do not recommend
net.ipv4.tcp_tw_reuse
Makes a safer attempt at freeing sockets in
TIME_WAIT state.
Customer Story
Easy Taxi : O seu estilo de pedir táxi!
S. KoreaSaudi Arabia
Pakistan
Brazil
Argentina
Peru
Mexico
Venezuela
Colombia
Ecuador
S. Africa
Namibia
Angola
Botswana
Kenya
Tanzania
Egypt
Morocco
Tunisia
Nigeria
Ghana
Ivory C.
Algeria
Hong Kong
Taiwan
Indonesia
Malaysia
Philippines
Singapore
Thailand
Vietnam
Present
Coming soon
Bolivia
Uruguay
Puerto
Rico
Panama
Costa Rica
Guatemala UAE
Jordan
Chile
India
• Um dos maiores
aplicativos de táxi do
mundo;
• Lançado no Rio de
Janeiro, presente em
mais de 30 países;
• O mesmo app para
todos os países;
• TI em São Paulo,
Brasil
• Milhões de clientes e
centenas de milhares
de taxistas
Arquitetura
• Mais de 400k requisições por minuto
• 100+ instâncias EC2 em produçãodistribuídas em diferentes availabilityzones em Virtual Private Clouds, diversosElastic Load Balancing
• RDS clusters, SQS, ElastiCache (Redis),CloudSearch, CloudWatch...
• Serviços Gerenciados permitem quenossos sys admins possam ser maisprodutivosAvailability Zone Availability Zone
API API API… API API API…
Mongo Mongo
Erros 400 no ELB
• Identificou-se um aumento de erros 400 no ELB;
• Em conjunto com o suporte enterprise da AWS, realizamos um
Deep dive nos logs de acesso do ELB usando Elasticsearch
• Verificamos que os eventos estavam correlacionados a usuários
mobile de operadoras que usavam NAT em suas conexões 3g;
• Tcpdump para trace de pacotes revelaram que conexões estavam
sendo silenciosamente descartadas;
Resultado das análises
• Depois das analises descobrimos que estávamos com as configuração abaixo
em nossos servidores
– net.ipv4.tcp_tw_recycle & net.ipv4.tcp_tw_reuse habilitados
• Quando se ativa recycle, o kernel tenta tomar decisões baseadas no timestamp
usado pelos hosts remotos. Ele tenta achar o último timestamp usado por cada
host remoto que tenham uma conexão em TIME_WAIT, e ira permitir o
reaproveitamento do socket se o timestamp tiver corretamente incrementado,
mas se o timestamp usado pelo host não tiver aumentado corretamente o
pacote será descartado pelo kernel.
• Muitos de nossos clientes conectam através de operadoras que usam NAT.
Com a alta taxa de acesso entrando do mesmo IP passamos a ter o kernel
recusando essas conexões devido a inconsistência no timestamp, resultando
um Bad Request (400) no ELB.
Conclusão
• A ajuda do suporte enterprise foi de extrema importância para
encontramos a solução para o nosso caso
• Se não tivéssemos todos os logs e os dados que levantamos
para a análise, teria sido extremamente difícil e
provavelmente não teríamos conseguido chegar a conclusão
do que estava acontecendo.
Thanks Vinicius!
Tweaking the Webserver stack
• Tune resources consumption
– Context Switches / CPU
– Memory Utilization
• Allow your webserver processes enough
requests concurrently
– “Child Processes” / “Max Clients” tunables
Webservers Tuning 101
• Keep an eye on the somaxconn limits
• Understand resources utilization by the webserver
– Process Isolation vs. Blast Radius
– Avoid Resources Saturation & Starvation
The backlog is back, again!
• man tcp(7) – tcp_defer_accept:
Webserver only awakes when there is data available!
• Reduce the burden on the webserver’s process
• TCP Socket is already established (i.e. no SYN flood)
Telling the webserver when to start
Nginx
• listen [deferred]
Apache
• AcceptFilter http data
• AcceptFilter https data
• man sendfile(2)
“copying is done within the kernel”
• I.e. no use of User Space
Using the Zero-copy pattern
Nginx
• sendfile on
Apache
• EnableSendFile on
HTTP Keep-Alive
Nginx
• keepalive_timeout 75s
• keepalive_requests 100
Apache
• KeepAlive On
• KeepAliveTimeout 5
• MaxKeepAliveRequests 100
Ensure it matches your ELB timeout setting; otherwise…
look into your ELB’s 5XX metric
“The small-packet problem”
Flush() (tcp_cork)
• flush() analogy
• The application needs to “uncork”
the stream
• sendfile() is a must
Auto in Apache (+sendfile option)
Set tcp_nopush to false in NGINX
Nagle’s algo (tcp_nodelay)
• The initial problem:
“congestion collapse”
• write() vs. writev()
• Onto the wire asap
Always On in Apache
Set tcp_nodelay flag in NGINX
“The small-packet problem”
Flush() (tcp_cork)
• flush() analogy
• The application needs to “uncork”
the stream
• sendfile() is a must
Auto in Apache (+sendfile option)
Set tcp_nopush to false in NGINX
Nagle’s algo (tcp_nodelay)
• The initial problem:
“congestion collapse”
• write() vs. writev()
• Onto the wire asap
Always On in Apache
Set tcp_nodelay flag in NGINX
/* TCP_NODELAY is weaker than TCP_CORK, so that
* this option on corked socket is remembered, but
* it is not activated until cork is cleared.
*
* However, when TCP_NODELAY is set we make
* an explicit push, which overrides even TCP_CORK
* for currently queued segments.
*/
Thanks Chartbeat!
Further details: http://engineering.chartbeat.com/author/justinlintz/
Start w/ Small Wins and keep iterating!
Quick review
• Keep the connection for as long as possible.
• Minimize the latency.
• Increase throughput.
• Most importantly, research what settings make
most sense for your environment.
Offload opportunities
• Leverage ELB’s
– Large Volumes Connection Handling
– SSL Off-loading
• CloudFront + S3 for static file delivery
– Tune HTTP responses’ cache headers
• Go Multi-region w/ Route 53 LBR
Last thoughts
• Monitor everything.
• Tune your server to your workload.
• Improvement must be quantifiable.
• Experiment and continuously re-validate!
And most importantly,
REMEMBER:
Otimizando Servidores Web
Davi Menezes & Robert Fuente
Cloud Technical Account Manager | AWS Support
OBRIGADO!
SÃO PAULO