Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS...

50
Shopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify

Transcript of Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS...

Page 1: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Shopify’s Architecture to Handle 80K RPS Celebrity Sales

Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify

Page 2: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Shopify is handling some of the largest sales in the world from Kylie Jenner, Kanye, Superbowl, and others

Page 3: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

— Tobi Lütke, CEO in internal essay on why we optimize for flash sales

“We learned to absorb these shocks and become stronger as a result. [..] The school of hard knocks has taught us well.”

Page 4: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

500K $5.8BMerchants powered Processed Q2, 2017

80K 40+Peak RPS Daily deploys

Rails 2000+Ruby on Rails since 2006 Employees

Page 5: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Traffic

Application

Data

Application

Data

Region A Region B

Page 6: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Traffic

Application

Data

Application

Data

Region A Region B

Page 7: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

• Global Routing

• Openresty

• Bots

• Cache hits

• Checkout Throttling

Traffic

Page 8: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

ISP

ISP

ISP

ISP

ISP

ISP

ISP

ISP

ISP

ISP

Region A

BGP ANNOUNCE 23.227.38.0/24

BGP ANNOUNCE 23.227.38.0/24

Region B

walrusser.myshopify.com 23.227.38.64

Page 9: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

OpenResty allows Lua scripting of your load balancers, it’s been one of the most impactful additions to our stack in recent memory

https://github.com/openresty/openresty

Nginx with OpenResty

Rule Banner

Kafka Logging

Edgecache

Checkout Throttle

Page 10: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

worker_processes 1; error_log logs/error.log; events { worker_connections 1024; } http { server { listen 8080; location / { default_type text/html; content_by_lua ' ngx.say("<p>hello, world</p>") '; } } }

Page 11: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Bot squasher analyzes the Kafka stream of incoming requests to ban bots with a rule banner module

Nginx with OpenResty

Rule Banner

KafkaBot Squasher

Kafka Logger

POST /checkoutBAN

23.227.38.178

Page 12: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Nginx with OpenResty

Edgecache

Memcached

GET /collections/walruses

HIT

Edgecache can serve full page cache hits out of the load-balancers in microseconds

Web Process

MISS

FILL

Page 13: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Nginx with OpenResty

Checkout Throttle

GET /checkout

Queue

/wait_area /checkout

Throttle

Checkout Throttle throttles the number of customers in the processing heavy checkout path

Page 14: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Traffic

Application

Data

Application

Data

Region A Region B

Page 15: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Pod is an isolated unit of one or more shops

Page 16: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

shop1

shop4

shop9

shop17

shop72

Data in Region A

shop3

shop72

shop92

shop18

shop64

shop22

shop88

shop0

sho52

shop23

Pod 14

Pod 2

Pod 7

Page 17: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Pod 14

Each Pod in Region A

Pod 2

Pod 7

MySQLRedis Memcache

MySQLRedis Memcache

MySQLRedis Memcache

Cron

Cron

Cron

Page 18: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Pod 14

Pod 2

Pod 7

MySQLRedis Memcache

MySQLRedis Memcache

MySQLRedis Memcache

Cron

Cron

Cron

Shared Workers

Page 19: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Pod 14

Pod 2

Pod 7

MySQLRedis Memcache

MySQLRedis Memcache

MySQLRedis Memcache

Cron

Cron

Cron

Shared Load Balancing

Page 20: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Genghis is our load-testing tool to test scale

Page 21: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Pod Balancer balances shops between pods with minimal downtime to keep load and size even

Page 22: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

shop1

shop4

shop9

shop17

shop72Pod Balancer

shop3

shop72

shop92

shop18

shop64

shop22

shop88

shop0

shop52

shop23

Pod 14

Pod 2

Pod 7

Page 23: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

shop1

shop4

shop9

shop17

shop72

Pod Balancer

shop3

shop72

shop92

shop18

shop64

shop22

shop88

shop0

shop52

shop23

Pod 14

Pod 2

Pod 7

Page 24: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

shop1

shop4

shop9shop17

shop72

Pod Balancer

shop3

shop72

shop92

shop18

shop64

shop22

shop88

shop0

shop52

shop23

Pod 14

Pod 2

Pod 7

shop98

Page 25: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

shop1

shop4

shop9

shop17

shop72

Pod Balancer

shop3

shop72

shop92

shop18

shop64

shop22

shop88

shop0

shop52

shop23

Pod 14

Pod 2

Pod 7

shop98

shop99shop100

Page 26: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

shop1

shop4

shop9

shop17

shop72Pod Balancer

shop3

shop72

shop92

shop18

shop64

shop22

shop88

shop0

shop52

shop23

Pod 14

Pod 2

Pod 7

shop98

shop99shop100

Pod 74

Page 27: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

shop1shop4

shop9shop17

shop72Pod Balancer

shop3

shop72

shop92

shop18

shop64

shop22

shop88

shop0

shop52

shop23

Pod 14

Pod 2

Pod 7

shop98shop99

shop100

Pod 74

Page 28: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

MySQLRedis MySQLRedis

COPY SHOP SELECT * FROM products WHERE shop_id = 38493 SELECT * from orders WHERE shop_id = 38493

Source Pod 9

Target Pod 23

Page 29: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

MySQLRedis MySQLRedis

COPY SHOP SELECT * FROM products WHERE shop_id = 38493 SELECT * from orders WHERE shop_id = 38493

NEW CHECKOUT INSERT INTO CHECKOUTS …

Source Pod 9

Target Pod 23

Page 30: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

MySQLRedisSource

Pod 9MySQLRedis

Target Pod 23

COPY SHOP_ID 238 SELECT * FROM products WHERE shop_id = 238 SELECT * from orders WHERE shop_id = 238

Bin LogREPLICATE SHOP_ID 238 CHECKOUT id: 383293

Page 31: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

MySQLRedisSource

Pod 9MySQLRedis

Target Pod 23

LOCK SHOP_ID 238

Routing

UPDATE SHOP_ID 238 pod_id=23

Page 32: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Traffic

Application

Data

Application

Data

Region A Region B

Page 33: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Sorting Hat routes requests for a shop to the region the pod is active in

Page 34: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Traffic

Region A Region B

ActivePod 7

InactivePod 2

ActivePod 14 Pod 14 Inactive

Inactive

ActivePod 2

Pod 7

Pod 14

Sorting Hat

GET /products Host: sneakershop.com

Routing

ROUTE sneakershop.com

shop238 pod2:B

Page 35: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Traffic

Application

Data

Application

Data

Region A Region B

Page 36: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Pod Mover moves pods between regions with minimal downtime

Page 37: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Traffic

Region A Region B

ActivePod 7

Pod 2 ActivePod 14 Pod 14 Inactive

Inactive

ActivePod 2

Pod 7

Pod 14

Sorting Hat

InactivePod 2

Page 38: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Traffic

Region A Region B

ActivePod 7

Pod 2 ActivePod 14 Pod 14 Inactive

Inactive

ActivePod 2

Pod 7

Pod 14

Sorting Hat

InactivePod 2

Page 39: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Update Routing for pod to target region pod2:b -> pod2:a

Sorting Hat routes requests to target region

Disable cron in both regions

Fail over MySQL to target region

Enable cron in both regions

Transfer jobs to target region

Page 40: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

What about errors while the database fails over?

Page 41: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Nginx with OpenResty

Pauser

POST /checkout (during failover)

Pauser will pause requests in the middle of failovers to avoid serving errors

QueueThrottle

HTTP 200 (seconds later)

Page 42: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Update Routing for pod to target region pod2:b -> pod2:a

Sorting Hat routes requests to target region and pause requests

Disable cron in both regions

Fail over MySQL to target region

Enable cron in both regions

Resume requests

Transfer jobs to target region

Page 43: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.
Page 44: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Cloud Migration with the Pods Architecture

Page 45: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

shop1

shop4

shop9

shop17

shop72

Region A

shop3

shop72

shop92

shop18

shop64

shop22

shop88

shop0

sho52

shop23

Cloud Region C

Page 46: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.
Page 47: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.

Thanks! @Sirupsen

Page 48: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.
Page 49: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.
Page 50: Shopify’s Architecture to Handle 80K RPS Celebrity SalesShopify’s Architecture to Handle 80K RPS Celebrity Sales Simon Eskildsen – @Sirupsen Production Engineering Lead, Shopify.