Otimizando servidores web

SÃO PAULO

Otimizando Servidores Webe seus componentes

Davi Menezes & Robert Fuente

Cloud Technical Account Manager | AWS Support

Different strategies for better performance

• Leverage newer hardware and software.

• Apply more resources through auto scaling.

• Offload the heavy lifting to someone else.

• Optimize the web server stack.

Defining “better” performance

• Throughput -- transactions per second (tps).

• Latency reduction.

• Cost reduction.

Optimizations by definition are app-specific

• Test and validate together with the application itself.

• There is no substitute to production data.

• Make it an integral part of the application itself.– E.g. Elastic Beanstalk .ebextensions

Identifying Bottlenecks

First understand your workload

• What are we serving?– Number of transactions

– Transaction size

– Back-end resource consumption

• How much can we do today?– Theoretical benchmark

– Actual production load (observability / data-driven)

• What is the bottleneck resource?– “Choose instance type for the bounding resource”

– Workload Analysis vs. Resource Analysis

https://youtu.be/7Cyd22kOqWc

https://youtu.be/7Cyd22kOqWc

Avoid tuning finds at random

Logs: the ultimate source of truth119.246.177.166 - - [02/Nov/2014:05:02:00 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-"

117.21.173.27 - - [02/Nov/2014:06:28:39 +0000] "GET /manager/html HTTP/1.1" 404 289 "-"


50.62.6.117 - - [02/Nov/2014:20:50:39 +0000] "GET //wp-login.php HTTP/1.1" 404 289 "-"

50.62.6.117 - - [02/Nov/2014:20:50:39 +0000] "GET /blog//wp-login.php HTTP/1.1" 404 295 "-"

50.62.6.117 - - [02/Nov/2014:20:50:40 +0000] "GET /wordpress//wp-login.php HTTP/1.1" 404 300 "-"

50.62.6.117 - - [02/Nov/2014:20:50:40 +0000] "GET /wp//wp-login.php HTTP/1.1" 404 293 "-"

24.199.131.50 - - [03/Nov/2014:08:00:30 +0000] "GET /tmUnblock.cgi HTTP/1.1" 400 301 "-"




62.210.136.228 - - [03/Nov/2014:22:31:22 +0000] "GET / HTTP/1.1" 403 3839 "-"


198.20.69.74 - - [04/Nov/2014:02:07:05 +0000] "GET / HTTP/1.1" 403 3839 "-"

198.20.69.74 - - [04/Nov/2014:02:07:13 +0000] "GET /robots.txt HTTP/1.1" 404 287 "-”



193.174.89.19 - - [04/Nov/2014:13:34:23 +0000] "GET / HTTP/1.1" 403 3839 "-"

CloudWatch Metric Anatomy

• Statistical aggregation

– Min

– Max

– Sum

– Average

– Count

• One data point per minute.

• Can trigger actions via

alarms.

Micro metrics vs. Macro metrics

• Agent-based monitoring

• Available in

Amazon Linux

• Provides highly-granular,

server-specific insights

Source: http://demo.munin-monitoring.org/

Coming from a variety of sources

Customer generated

• Kernel and Operating System

• Web Server

• Application Server/Middleware

• Application code

• Instance networking

AWS generated

• Amazon CloudFront

• Amazon Elastic Load Balancing

• Amazon CloudWatch

• Amazon Simple Storage Service

0

50

100

150

200

250

1 6 111621263136414651566166717681869196

Latency at percentile Average Latency

0

200

400

600

800

1000

1200

1400

1600

1800

2000

6 9

12

15

18

21

24

27

30

33

36

39

42

45

48

55

20

4

20

7

21

0

Latency Histogram

Frequency

More than meet the eyes

Noteworthy AWS CloudWatch metrics

• EC2 Instances

– New T2 CPU Credits

– CPU utilization

– Bandwidth (In/Out)

• EBS

– PIOPS utilization

– GP2 utilization

– Remember: 8GB volumewill provision 24 IOPs!

• Elastic Load Balancing

– RequestCount

– Latency

– Queue length and spillover

– Backend connections errors

• CloudFront

– Requests

– BytesDownloaded

Diving Deep on the Last Mile (you & us)

Elastic Load Balancer

ELB Connection Behavior

• No true limits on influx of connections

– But may require preemptive scaling (aka Pre-warming)

• Leverages HTTP Keep-Alives

• Configurable Idle Connection Timeout

• HTTP Session Stickness & Health-checking

– Fast Registration

• SSL Off-loading and Back-end authentication

ELB access logs

HTTP log entries

• Only one side of picture.

• Can’t log custom headers or

format logs.

• Logs are delayed.

• Drill down to instance level

responsiveness, but can’t dive

into latency outliers

0

5

10

15

20

25

30

35

Processing Time

response_processing_time

request_processing_time

backend_processing_time

byte

s

ELB Key Metrics

• Latency and Request Count

• Surge Queue and Spillover

• ELB 5xx and 4xx

• Back-end Connection Errors

• Healthy and Unhealthy Host Counts

The life of an HTTP connection

int cfd,fd=socket(PF_INET,SOCK_STREAM,IPPROTO_TCP);

struct sockaddr_in si;

si.sin_family=PF_INET;

inet_aton("127.0.0.1",&si.sin_addr);

si.sin_port=htons(80);

bind(fd,(struct sockaddr*)si,sizeof si);

listen(fd,512);

while ((cfd=accept(fd,(struct sockaddr*)si,sizeof si)) != -1) {

read_request(cfd); /* read(cfd,...) until "\r\n\r\n" */

write(cfd,"200 OK HTTP/1.0\r\n\r\n"

”Bem-vindo ao AWS Summit SP 2015.",19+27);

close(cfd);

}

http:80fd=socket(PF_INET,SOCK_STREAM,IPPROTO_TCP)

bind(fd,(struct sockaddr*)si,sizeof si)

listen(fd,512)

accept(fd,(struct sockaddr*)si,sizeof si)

# of open

file descriptors

The last TCP mile

• Accept Pending Queue– man listen(2): “(…) backlog argument defines the maximum length to which the

queue of pending connections for sockfd may grow.”

– Recv-Q & Send-Q – TCP is stream oriented

• man accept(2): Blocking vs. Non-blocking sockets

Tweaking the TCP stack (aka sysctl)

Queuing at the TCP layer first

• ECONNREFUSED

man listen(2):

“if the underlying protocol supports

retransmission, the request may be ignored

so that a later reattempt at connection

succeeds” – aka: TCP Retransmit

Scaling in the Linux Networking Stack

• Connection States– man netstat(8)

• Backlog Maximum Length– Waiting to be accepted: /proc/sys/net/core/somaxconnn

– Half-Open connections: /proc/sys/net/ipv4/tcp_max_syn_backlog

– CPU's input packet queue: /proc/sys/net/core/netdev_max_backlog

TCP is a Window based protocol

• TCP Receive Window

“considered one of the most important TCP tweaks” (ugh!)

– BDP = avail. bandwidth (KBps) X RTT (ms)

• Choose an EC2 Instance

with proper Bandwidth

TCP Initial Congestion Window

• RFC 3390 – Higher Initial Window

– ip route (…) initcwnd 10 (kernel <2.6.39)

• Disable Slow Start (net.ipv4.tcp_slow_start_after_idle)

• Google Research

– “propose to increase (…) to at least ten segments (about 15KB)

Pub: “An Argument for Increasing TCP's Initial Congestion Window”

+/* TCP initial congestion window */

+#define TCP_INIT_CWND 10http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=(…)

commited to the

kernel 2.6.39 (May 2011)

TCP Buffers & Memory Utilization

• Buffering– Use case: sending/receiving large amounts of data

– Auto-tunable by the kernel

– However, has bounds: min, default, and max.

– Tune: net.ipv4.tcp_rmem/wmem (in bytes)

• Sockets demand on page allocation– Tune: net.ipv4.tcp_mem (in pages)

inet_timewait_death_row

About TIME-WAIT state

• TIME-WAIT Assassination RFC

• Increase your port range– net.ipv4.ip_local_port_range

– A ballpark of your rate of connections per second:

(ip_local_port_range / tcp_fin_timeout)

leads to about 500 connections per second !

“The TIME_WAIT state is our friend and is there to help us (i.e., to let old

duplicate segments expire in the network). Instead of trying to avoid the state,

we should understand it.”Vincent Bernat - (vincent.bernat.im)

Check your sources

XKCD: Duty Call - https://xkcd.com/386/

• Clients behind NAT/Stateful FW

• will get dropped

*99.99999999% of time

should never be enabled

* Probably 100% but there may be a valid case out there

TL;DR: Do *not* enable net.ipv4.tcp_tw_recycle

Linux’s TCP protocol man page

do not recommend

net.ipv4.tcp_tw_reuse

Makes a safer attempt at freeing sockets in

TIME_WAIT state.

Customer Story

Easy Taxi : O seu estilo de pedir táxi!

S. KoreaSaudi Arabia

Pakistan

Brazil

Argentina

Peru

Mexico

Venezuela

Colombia

Ecuador

S. Africa

Namibia

Angola

Botswana

Kenya

Tanzania

Egypt

Morocco

Tunisia

Nigeria

Ghana

Ivory C.

Algeria

Hong Kong

Taiwan

Indonesia

Malaysia

Philippines

Singapore

Thailand

Vietnam

Present

Coming soon

Bolivia

Uruguay

Puerto

Rico

Panama

Costa Rica

Guatemala UAE

Jordan

Chile

India

• Um dos maiores

aplicativos de táxi do

mundo;

• Lançado no Rio de

Janeiro, presente em

mais de 30 países;

• O mesmo app para

todos os países;

• TI em São Paulo,

Brasil

• Milhões de clientes e

centenas de milhares

de taxistas

Arquitetura

• Mais de 400k requisições por minuto

• 100+ instâncias EC2 em produçãodistribuídas em diferentes availabilityzones em Virtual Private Clouds, diversosElastic Load Balancing

• RDS clusters, SQS, ElastiCache (Redis),CloudSearch, CloudWatch...

• Serviços Gerenciados permitem quenossos sys admins possam ser maisprodutivosAvailability Zone Availability Zone

API API API… API API API…

Mongo Mongo

Erros 400 no ELB

• Identificou-se um aumento de erros 400 no ELB;

• Em conjunto com o suporte enterprise da AWS, realizamos um

Deep dive nos logs de acesso do ELB usando Elasticsearch

• Verificamos que os eventos estavam correlacionados a usuários

mobile de operadoras que usavam NAT em suas conexões 3g;

• Tcpdump para trace de pacotes revelaram que conexões estavam

sendo silenciosamente descartadas;

Resultado das análises

• Depois das analises descobrimos que estávamos com as configuração abaixo

em nossos servidores

– net.ipv4.tcp_tw_recycle & net.ipv4.tcp_tw_reuse habilitados

• Quando se ativa recycle, o kernel tenta tomar decisões baseadas no timestamp

usado pelos hosts remotos. Ele tenta achar o último timestamp usado por cada

host remoto que tenham uma conexão em TIME_WAIT, e ira permitir o

reaproveitamento do socket se o timestamp tiver corretamente incrementado,

mas se o timestamp usado pelo host não tiver aumentado corretamente o

pacote será descartado pelo kernel.

• Muitos de nossos clientes conectam através de operadoras que usam NAT.

Com a alta taxa de acesso entrando do mesmo IP passamos a ter o kernel

recusando essas conexões devido a inconsistência no timestamp, resultando

um Bad Request (400) no ELB.

Conclusão

• A ajuda do suporte enterprise foi de extrema importância para

encontramos a solução para o nosso caso

• Se não tivéssemos todos os logs e os dados que levantamos

para a análise, teria sido extremamente difícil e

provavelmente não teríamos conseguido chegar a conclusão

do que estava acontecendo.

Thanks Vinicius!

Tweaking the Webserver stack

• Tune resources consumption

– Context Switches / CPU

– Memory Utilization

• Allow your webserver processes enough

requests concurrently

– “Child Processes” / “Max Clients” tunables

Webservers Tuning 101

• Keep an eye on the somaxconn limits

• Understand resources utilization by the webserver

– Process Isolation vs. Blast Radius

– Avoid Resources Saturation & Starvation

The backlog is back, again!

• man tcp(7) – tcp_defer_accept:

Webserver only awakes when there is data available!

• Reduce the burden on the webserver’s process

• TCP Socket is already established (i.e. no SYN flood)

Telling the webserver when to start

Nginx

• listen [deferred]

Apache

• AcceptFilter http data

• AcceptFilter https data

• man sendfile(2)

“copying is done within the kernel”

• I.e. no use of User Space

Using the Zero-copy pattern

Nginx

• sendfile on

Apache

• EnableSendFile on

HTTP Keep-Alive

Nginx

• keepalive_timeout 75s

• keepalive_requests 100

Apache

• KeepAlive On

• KeepAliveTimeout 5

• MaxKeepAliveRequests 100

Ensure it matches your ELB timeout setting; otherwise…

look into your ELB’s 5XX metric

“The small-packet problem”

Flush() (tcp_cork)

• flush() analogy

• The application needs to “uncork”

the stream

• sendfile() is a must

Auto in Apache (+sendfile option)

Set tcp_nopush to false in NGINX

Nagle’s algo (tcp_nodelay)

• The initial problem:

“congestion collapse”

• write() vs. writev()

• Onto the wire asap

Always On in Apache

Set tcp_nodelay flag in NGINX

“The small-packet problem”

Flush() (tcp_cork)

• flush() analogy

• The application needs to “uncork”

the stream

• sendfile() is a must

Auto in Apache (+sendfile option)

Set tcp_nopush to false in NGINX

Nagle’s algo (tcp_nodelay)

• The initial problem:

“congestion collapse”

• write() vs. writev()

• Onto the wire asap

Always On in Apache

Set tcp_nodelay flag in NGINX

/* TCP_NODELAY is weaker than TCP_CORK, so that

* this option on corked socket is remembered, but

* it is not activated until cork is cleared.

*

* However, when TCP_NODELAY is set we make

* an explicit push, which overrides even TCP_CORK

* for currently queued segments.

*/

Thanks Chartbeat!

Further details: http://engineering.chartbeat.com/author/justinlintz/

Start w/ Small Wins and keep iterating!

Quick review

• Keep the connection for as long as possible.

• Minimize the latency.

• Increase throughput.

• Most importantly, research what settings make

most sense for your environment.

Offload opportunities

• Leverage ELB’s

– Large Volumes Connection Handling

– SSL Off-loading

• CloudFront + S3 for static file delivery

– Tune HTTP responses’ cache headers

• Go Multi-region w/ Route 53 LBR

Last thoughts

• Monitor everything.

• Tune your server to your workload.

• Improvement must be quantifiable.

• Experiment and continuously re-validate!

And most importantly,

REMEMBER:

Otimizando Servidores Web

Davi Menezes & Robert Fuente

Cloud Technical Account Manager | AWS Support

OBRIGADO!

SÃO PAULO

Otimizando servidores web

Internet

Transcript of Otimizando servidores web