Stack Exchange Infrastructure - LISA 14
description
Transcript of Stack Exchange Infrastructure - LISA 14
inet.perf.profile
• SRE Generalist @ Stack Exchange
• @GABeech
• http://brokenhaze.com
• http://stackexchange.com
A brief Overview
• 560 Million Page Views a Month
• 34TB of Data transfered a Month
• 1665 rps (2250 peak) Across web Farm
• WISC(HER)
Windows IIS SQL Server C# HAProxy Elastic Search Redis
Our First Priority is Performance
Nobody likes a slow site, least of all us. When your site is slow people leave.
!
Make your site fast, and the people will stay !
Good write up on moz.com: http://moz.com/blog/site-speed-are-you-fast-does-it-
matter
Why do I bring up performance in an infra talk? simple. It drives our design decisions.
The Performance toolkit
• Mini Profiler
• OpServer (https://github.com/opserver/Opserver)
• Client Timings (http://teststackoverflow.com/)
You can’t be fast if you are not up
• Highly Redundant network
• Datacenter, ISP, Edge, Core, Server, Port
The actual design starts now.
4 Different providers Selected for different characteristics Router Redundancy Hot/Standby HSRP/BGP on “T2” Full BGP tables and HSRP on T1
Load Balencers
• HAProxy
• 2 Servers (Hot/Standby)
• Multiple Tiers (HAProxy Processes)
4B requests/month 3000 req/sec peak 10% CPU 18% peak Between 600 and 700 concurrent connections (EST, TIME_WAIT, ETC) Multiple Processes Allow for granular restarts and segregation of faults SSL Termination done on the LB Websockets: The weird connection Long lived TCP not HTTP
SSL Termination
• Terminated at LB
• Feature added to HAProxy 1.5
• See: http://brokenhaze.com/blog/2014/03/25/how-stack-exchange-gets-the-most-out-of-haproxy/
Source Port Exhaustion use 127.0.0.0/8 to resolve Server only running at ~12% cpu We don’t run full SSL everywhere yet
Web Servers
!
• IIS
• 9 Production (2 Test/Dev)
• Dell R610’s
• 32GB Memory
• 2xE5-5640
185 req/s 250 peak 15% CPU usage 20% peak
Data Tier
• MS SQL Server
• 4 Servers
• 2 Always-On Clusters
• Each Cluster 1 RW, 1 RO
(SO) 343 M Queries per day (SO) Peak of 7500 queries / second (SE) 216M Queries per day (SE) Peak 3200 queries / second !CPU Use: SO 8% Peak 15% — SE 10% Peak 20%
Caching Tier
• Redis
• 2 Servers
• Hot / Standby configuration
3.65 B operations a day Peak 60,000/s 3% cpu usage !
Tag Engine
• Our Special index of SO
• Tagging is hard
• Written by Marc Gravell
• http://blog.marcgravell.com/2014/04/technical-debt-case-study-tags.html
3 Servers, 32 GB RAM 3644 req/s 3% CPU 10% peak Replaced Full Text search in SQL Server Spins up a full copy of SO/SE Cool thing can be upgraded with 0 downtime
Elastic Search
• 203GB Index
• 3 Machines
• 42M searches/day
2 others/ not prod Machine learning Log stash (300TB)
Deployment
• Git
• TeamCity
• Custom Powershell Scripts
Team City monitors our Development Git repository Dev Auto builds (Deploy to Meta) When the build is verified Dev triggers Prod Build Copy Artifacts from Dev Build
Always See our Performance
• http://stackexchange.com/performance
Thank YOU!Contact:
@GABeech [email protected]
Office Hours: Wednesday, November 12th
(today…) 2:00pm - 3:30pm
LISA Lab