How Mapbox Scales over 9 AWS Regions
-
Upload
johan -
Category
Technology
-
view
393 -
download
0
Transcript of How Mapbox Scales over 9 AWS Regions
• What is Mapbox?
• 9 data centers?
• Tracing a map request
2
3
4
5
6
• What is Mapbox?
• 9 Data Centers?
• Tracing a map request
7
9 Data Centers?
8
9
10
9 Data Centers?
• Why run this in many regions?
• One region = cheaper, less complex, easier to build and maintain
11
9 Data Centers?
• Global high availability
• Global low latency
12
9 Data Centers?Global high availability
13
9 Data Centers?Global high availability
• Mapbox is critical infrastructure for our customers
• Mapbox SLA: 99.9%
• Problems for high availability
• AWS problems
• Mapbox software or configuration problems
• Critical deploys14
9 Data Centers?
• Global high availability
• Global low latency
15
9 Data Centers?Global low latency
16
9 Data Centers?Global low latency
• Can't beat the speed of light
• Latency is critical for using a map
• Bring our data closer to our users
17
9 Data Centers?
• Global high availability
• Global low latency
18
Let's trace a request
19
20
21
What is a map?
22
23
• Grid over the world
• Every cell of the grid is a tile• Different zoomlevels
• Zoomlevel 0 is the world
• Zoomlevel 13 is a city
• Every tile is identified by mapid, coordinates and zoomlevel
24
25
Client• Browser loads Javascript
• Mapbox.js allows for customizing map with very few lines
• Javascript
• Determine viewport
• Request each individual tile
26
Client
https://tiles.mapbox.com/v4/map.id/17/70428/42997.png?access_token=pk.xxx
27
28
CDN• Content Distribution Network
• Physical cache close to users
• AWS: Cloudfront
• Others: Akamai, Fastly, CloudFlare
29
CDN
30
CDN• When a request comes in:
• Find nearest edge location
• Terminate TLS
• Match request to behaviour
• Look in cache (based on URL & Query String)
• If object is there: return
31
CDN• Your CDN works best if it can serve everything from cache
• How to remove stale data?
• Trade-off: high cache hit rate vs. update delay
• Time-To-Live when a cached object expires
• We use 5 minutes
• 35 % cache hit rate
32
33
DNS• Originally: Resolve domain names to IP addresses
• Also: Route request to nearest data center
• best region for request based historic on latency
• Amazon: Route53
• Others: Dyn, easyDNS, Akamai34
DNS
35
36
Load Balancer• Route requests to application servers
• Entry point to a region
• AWS: Elastic Load Balancer (ELB)
• Others: haproxy, nginx, f5
37
Load Balancer• Terminate TLS
• Determine which application server to route to
• Healthy server
• ELB: Server with least outstanding requests
• Wait for results and return
38
39
Application Servers• Virtual Machines
• AWS: Elastic Compute Cloud (EC2)
• Others: Google Compute Cloud, Rackspace, Digital Ocean
40
Application Servers• c3.xlarge instances
• Ubuntu Linux
• Node.js/Express
41
Application Servers• Authenticate
• Load map data
• Fetch tile and return
42
43
DynamoDB
• Primary/Replica
• Reads to replicas, writes only to primary
• Replicas only in 2 regions
• Reads for non-replica regions need to go over the Internet
• In-instance caching of authentication/map information
1 https://www.mapbox.com/blog/scaling-the-mapbox-infrastructure-with-dynamodb-streams/
44
45
Application ServersFetch tiles
• check simultanously in cache (redis) and object store (s3)
• return from where is found first
• if only found in object store, update local cache
46
Application Servers• redis is used as least-recently used cache, thus popular tiles
for a region are usually cached
• s3 is slow, because data is in us-east-1 bucket only
• Stats:
• 80% cache hits
• r3.4xlarge with 122 GB of memory
47
48
49
From 2 to 9 regions
50
From 2 to 9 regions
51
From 2 to 9 regions
52
http://geodevelopers.org/54
Elasticity• EC2 instances are provisioned via Auto Scaling Group
• Auto Scaling is based on instance CPU load
• Scale up/down if CPU load over/under 55%/20% for 2 minutes
55
56