Scaling Infrastructure at Carousell

140
Scaling Infrastructure at Carousell Harshad Rotithor & Ankur Shrivastava January 12, 2017 Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 1 / 48

Transcript of Scaling Infrastructure at Carousell

Page 1: Scaling Infrastructure at Carousell

Scaling Infrastructure at Carousell

Harshad Rotithor & Ankur Shrivastava

January 12, 2017

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 1 / 48

Page 2: Scaling Infrastructure at Carousell

Who are we?

Harshad Rotithor

Principle Software Engineer

Leads Infrastructure team

Previously at Flipkart,Airpush, Zynga, etc.

[email protected]

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 2 / 48

Page 3: Scaling Infrastructure at Carousell

Who are we?

Ankur Shrivastava

Senior Software Engineer

Engineer in the Infrastructureteam

Previously at Flipkart,Amazon, Zynga, etc.

[email protected]

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 3 / 48

Page 4: Scaling Infrastructure at Carousell

Where are we currently?

Started in 2012 at a Hackathon

7 countries, 19 cities

57M+ listings

23M+ items sold

Carousell makes buying and sellingsimple, so that you can fill our lifewith more meaningful things

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 4 / 48

Page 5: Scaling Infrastructure at Carousell

Where are we currently?

Started in 2012 at a Hackathon

7 countries, 19 cities

57M+ listings

23M+ items sold

Carousell makes buying and sellingsimple, so that you can fill our lifewith more meaningful things

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 4 / 48

Page 6: Scaling Infrastructure at Carousell

Where are we currently?

Started in 2012 at a Hackathon

7 countries, 19 cities

57M+ listings

23M+ items sold

Carousell makes buying and sellingsimple, so that you can fill our lifewith more meaningful things

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 4 / 48

Page 7: Scaling Infrastructure at Carousell

Where are we currently?

400+ servers

Multiple Services see 2000+ requests per second

Self Managed deployments

PostgresSQLElasticSearchCassandraRabbitMQKafkaRedisMemcacheand more ...

Uptime of 99.95

Ability to handle AZ failures

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 5 / 48

Page 8: Scaling Infrastructure at Carousell

Where are we currently?

400+ servers

Multiple Services see 2000+ requests per second

Self Managed deployments

PostgresSQLElasticSearchCassandraRabbitMQKafkaRedisMemcacheand more ...

Uptime of 99.95

Ability to handle AZ failures

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 5 / 48

Page 9: Scaling Infrastructure at Carousell

Where are we currently?

400+ servers

Multiple Services see 2000+ requests per second

Self Managed deployments

PostgresSQLElasticSearchCassandraRabbitMQKafkaRedisMemcacheand more ...

Uptime of 99.95

Ability to handle AZ failures

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 5 / 48

Page 10: Scaling Infrastructure at Carousell

So what is this talk about ?

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 6 / 48

Page 11: Scaling Infrastructure at Carousell

What it took to reach hereAnd what lies ahead!

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 7 / 48

Page 12: Scaling Infrastructure at Carousell

Current Infrastructure - Overview

Infrastructure is:

ArchitectureSystemsOperations

Stateful components most important

We self-manage user path datastores

Enable choice of data storesRight tradeoff in terms ofconsistencyEnable possibilities ofworkarounds during rough timesHave flexibility in nodeconfiguration etc

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 8 / 48

Page 13: Scaling Infrastructure at Carousell

Current Infrastructure - Overview

Infrastructure is:

ArchitectureSystemsOperations

Stateful components most important

We self-manage user path datastores

Enable choice of data storesRight tradeoff in terms ofconsistencyEnable possibilities ofworkarounds during rough timesHave flexibility in nodeconfiguration etc

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 8 / 48

Page 14: Scaling Infrastructure at Carousell

Current Infrastructure - Overview

Infrastructure is:

ArchitectureSystemsOperations

Stateful components most important

We self-manage user path datastores

Enable choice of data storesRight tradeoff in terms ofconsistencyEnable possibilities ofworkarounds during rough timesHave flexibility in nodeconfiguration etc

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 8 / 48

Page 15: Scaling Infrastructure at Carousell

Current Infrastructure

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 9 / 48

Page 16: Scaling Infrastructure at Carousell

Current Infrastructure - Data Stores

Master + 2 Slaves in each AZ (Total7)

pgbouncer + HA Proxy(config-service)

Dedicated data disks (always useSSDs)

Master disk snapshot every 3hr(fsync enabled)

Don’t turn off Autovacuum(transaction id)

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 10 / 48

Page 17: Scaling Infrastructure at Carousell

Current Infrastructure - Data Stores

Master + 2 Slaves in each AZ (Total7)

pgbouncer + HA Proxy(config-service)

Dedicated data disks (always useSSDs)

Master disk snapshot every 3hr(fsync enabled)

Don’t turn off Autovacuum(transaction id)

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 10 / 48

Page 18: Scaling Infrastructure at Carousell

Current Infrastructure - Data Stores

3 clusters, largest being close to 75 nodes

Shard allocation awareness

Use Plugins (kopf /head/cerebro)

Keep masters in different AZ

HAProxy with L7 healthchecks(config-service)

Incremental backups

Set shard count correctly, be on higher side.

Rely on linux page cache

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 11 / 48

Page 19: Scaling Infrastructure at Carousell

Current Infrastructure - Data Stores

3 clusters, largest being close to 75 nodes

Shard allocation awareness

Use Plugins (kopf /head/cerebro)

Keep masters in different AZ

HAProxy with L7 healthchecks(config-service)

Incremental backups

Set shard count correctly, be on higher side.

Rely on linux page cache

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 11 / 48

Page 20: Scaling Infrastructure at Carousell

Current Infrastructure - Data Stores

3 clusters, largest being close to 75 nodes

Shard allocation awareness

Use Plugins (kopf /head/cerebro)

Keep masters in different AZ

HAProxy with L7 healthchecks(config-service)

Incremental backups

Set shard count correctly, be on higher side.

Rely on linux page cache

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 11 / 48

Page 21: Scaling Infrastructure at Carousell

History

Cloud provider ’x’

Everyday firefighting

We hit upper limits

NetworkDisk

Noisy neighbours

Limited types of instances

Lack of features

Load balancerAutoscalingSecurity!

Decided on Migration

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 12 / 48

Page 22: Scaling Infrastructure at Carousell

History

Cloud provider ’x’

Everyday firefighting

We hit upper limits

NetworkDisk

Noisy neighbours

Limited types of instances

Lack of features

Load balancerAutoscalingSecurity!

Decided on Migration

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 12 / 48

Page 23: Scaling Infrastructure at Carousell

History

Cloud provider ’x’

Everyday firefighting

We hit upper limits

NetworkDisk

Noisy neighbours

Limited types of instances

Lack of features

Load balancerAutoscalingSecurity!

Decided on Migration

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 12 / 48

Page 24: Scaling Infrastructure at Carousell

History

Cloud provider ’x’

Everyday firefighting

We hit upper limits

NetworkDisk

Noisy neighbours

Limited types of instances

Lack of features

Load balancerAutoscalingSecurity!

Decided on Migration

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 12 / 48

Page 25: Scaling Infrastructure at Carousell

Planning

Around June 2016

250+ Nodes

Identify ALL nodes and their functionalities

Identify ALL traffic flows and patterns

Architecture Freeze

Perform comparative benchmarks

Redefine node and cluster configuration

Isolated deployment in GCP

Dry run data migration for all clusters

Estimate time

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 13 / 48

Page 26: Scaling Infrastructure at Carousell

Planning

Around June 2016

250+ Nodes

Identify ALL nodes and their functionalities

Identify ALL traffic flows and patterns

Architecture Freeze

Perform comparative benchmarks

Redefine node and cluster configuration

Isolated deployment in GCP

Dry run data migration for all clusters

Estimate time

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 13 / 48

Page 27: Scaling Infrastructure at Carousell

Planning

Around June 2016

250+ Nodes

Identify ALL nodes and their functionalities

Identify ALL traffic flows and patterns

Architecture Freeze

Perform comparative benchmarks

Redefine node and cluster configuration

Isolated deployment in GCP

Dry run data migration for all clusters

Estimate time

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 13 / 48

Page 28: Scaling Infrastructure at Carousell

Preparation

July 2016

VPN across the providers (HeavyDuty)

Replicate all that can be replicated(inter DC)

Keep stateless nodes ready

Make DNS nameserver changes inadvance (3-4 days)

Script everything - node creation,data movement, etc.

Aim for only data movement duringMigration

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 14 / 48

Page 29: Scaling Infrastructure at Carousell

Preparation

July 2016

VPN across the providers (HeavyDuty)

Replicate all that can be replicated(inter DC)

Keep stateless nodes ready

Make DNS nameserver changes inadvance (3-4 days)

Script everything - node creation,data movement, etc.

Aim for only data movement duringMigration

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 14 / 48

Page 30: Scaling Infrastructure at Carousell

Preparation

Practice, Practice, Practice!

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 15 / 48

Page 31: Scaling Infrastructure at Carousell

Migration

29th July 2016 at 3am

Queues - RabbitMQ, Kafka, etc

Drain on XSwitch to new on GCP

DB

Replicated slaves across DCPromote to master and createslaves

ElasticSearch & Cassandra

Snapshot/RestoreVery Quick - Fast GCP network

Redis

RDB restore, create slavesBeware of cluster state in case ofredis cluster

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 16 / 48

Page 32: Scaling Infrastructure at Carousell

Migration

29th July 2016 at 3am

Queues - RabbitMQ, Kafka, etc

Drain on XSwitch to new on GCP

DB

Replicated slaves across DCPromote to master and createslaves

ElasticSearch & Cassandra

Snapshot/RestoreVery Quick - Fast GCP network

Redis

RDB restore, create slavesBeware of cluster state in case ofredis cluster

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 16 / 48

Page 33: Scaling Infrastructure at Carousell

Migration

29th July 2016 at 3am

Queues - RabbitMQ, Kafka, etc

Drain on XSwitch to new on GCP

DB

Replicated slaves across DCPromote to master and createslaves

ElasticSearch & Cassandra

Snapshot/RestoreVery Quick - Fast GCP network

Redis

RDB restore, create slavesBeware of cluster state in case ofredis cluster

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 16 / 48

Page 34: Scaling Infrastructure at Carousell

Migration

29th July 2016 at 3am

Queues - RabbitMQ, Kafka, etc

Drain on XSwitch to new on GCP

DB

Replicated slaves across DCPromote to master and createslaves

ElasticSearch & Cassandra

Snapshot/RestoreVery Quick - Fast GCP network

Redis

RDB restore, create slavesBeware of cluster state in case ofredis cluster

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 16 / 48

Page 35: Scaling Infrastructure at Carousell

Migration

29th July 2016 at 3am

Queues - RabbitMQ, Kafka, etc

Drain on XSwitch to new on GCP

DB

Replicated slaves across DCPromote to master and createslaves

ElasticSearch & Cassandra

Snapshot/RestoreVery Quick - Fast GCP network

Redis

RDB restore, create slavesBeware of cluster state in case ofredis cluster

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 16 / 48

Page 36: Scaling Infrastructure at Carousell

Post Migration

5-6hr of Maintenance

Latency dropped to 1/4th on GCP

DNS propagation issue (even after 2 days)

L7 tunnels over VPN

Ensure monitoring is taken over after migration

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 17 / 48

Page 37: Scaling Infrastructure at Carousell

Key Take Away

Practice makes the migrationperfect!

Keep stateless nodes ready

Keep configuration updated

Expect issues

Redis cluster state switchDNS caching by ISPs for days

Keep Calm!

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 18 / 48

Page 38: Scaling Infrastructure at Carousell

Key Take Away

Practice makes the migrationperfect!

Keep stateless nodes ready

Keep configuration updated

Expect issues

Redis cluster state switchDNS caching by ISPs for days

Keep Calm!

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 18 / 48

Page 39: Scaling Infrastructure at Carousell

From Pets To Cattle

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 19 / 48

Page 40: Scaling Infrastructure at Carousell

From Pets To Cattle

Static Infrastructure is a myth!

Manual updates can be faulty

Nodes can fail quickly, one afteranother

Configuration can quickly becomestale

Misconfiguration of Nodes

Salt propagation issuesRecent config update

Painful to detect and fix

Production impact!

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 20 / 48

Page 41: Scaling Infrastructure at Carousell

From Pets To Cattle

Static Infrastructure is a myth!

Manual updates can be faulty

Nodes can fail quickly, one afteranother

Configuration can quickly becomestale

Misconfiguration of Nodes

Salt propagation issuesRecent config update

Painful to detect and fix

Production impact!

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 20 / 48

Page 42: Scaling Infrastructure at Carousell

From Pets To Cattle

Static Infrastructure is a myth!

Manual updates can be faulty

Nodes can fail quickly, one afteranother

Configuration can quickly becomestale

Misconfiguration of Nodes

Salt propagation issuesRecent config update

Painful to detect and fix

Production impact!

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 20 / 48

Page 43: Scaling Infrastructure at Carousell

From Pets To Cattle

Static Infrastructure is a myth!

Manual updates can be faulty

Nodes can fail quickly, one afteranother

Configuration can quickly becomestale

Misconfiguration of Nodes

Salt propagation issuesRecent config update

Painful to detect and fix

Production impact!

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 20 / 48

Page 44: Scaling Infrastructure at Carousell

From Pets To Cattle

Static Infrastructure is a myth!

Manual updates can be faulty

Nodes can fail quickly, one afteranother

Configuration can quickly becomestale

Misconfiguration of Nodes

Salt propagation issuesRecent config update

Painful to detect and fix

Production impact!

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 20 / 48

Page 45: Scaling Infrastructure at Carousell

From Pets To Cattle

Infrastructure at scale needs →

Centralized configurations

Dynamic Discovery

Automatic recovery from failures

Autoscaling

Scripts for stateful nodes(create/update/migrate)

Aggressive Monitoring and Alerting

Streamline Deployments

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 21 / 48

Page 46: Scaling Infrastructure at Carousell

From Pets To Cattle

Infrastructure at scale needs →

Centralized configurations

Dynamic Discovery

Automatic recovery from failures

Autoscaling

Scripts for stateful nodes(create/update/migrate)

Aggressive Monitoring and Alerting

Streamline Deployments

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 21 / 48

Page 47: Scaling Infrastructure at Carousell

From Pets To Cattle

Infrastructure at scale needs →

Centralized configurations

Dynamic Discovery

Automatic recovery from failures

Autoscaling

Scripts for stateful nodes(create/update/migrate)

Aggressive Monitoring and Alerting

Streamline Deployments

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 21 / 48

Page 48: Scaling Infrastructure at Carousell

From Pets To Cattle

Infrastructure at scale needs →

Centralized configurations

Dynamic Discovery

Automatic recovery from failures

Autoscaling

Scripts for stateful nodes(create/update/migrate)

Aggressive Monitoring and Alerting

Streamline Deployments

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 21 / 48

Page 49: Scaling Infrastructure at Carousell

From Pets To Cattle

Infrastructure at scale needs →

Centralized configurations

Dynamic Discovery

Automatic recovery from failures

Autoscaling

Scripts for stateful nodes(create/update/migrate)

Aggressive Monitoring and Alerting

Streamline Deployments

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 21 / 48

Page 50: Scaling Infrastructure at Carousell

From Pets To Cattle

Infrastructure at scale needs →

Centralized configurations

Dynamic Discovery

Automatic recovery from failures

Autoscaling

Scripts for stateful nodes(create/update/migrate)

Aggressive Monitoring and Alerting

Streamline Deployments

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 21 / 48

Page 51: Scaling Infrastructure at Carousell

From Pets To Cattle

Infrastructure at scale needs →

Centralized configurations

Dynamic Discovery

Automatic recovery from failures

Autoscaling

Scripts for stateful nodes(create/update/migrate)

Aggressive Monitoring and Alerting

Streamline Deployments

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 21 / 48

Page 52: Scaling Infrastructure at Carousell

Configuration and Service Discovery

For Configuration we needed →

Centralized configuration storage

Consistent store

Audit of configuration changes

Versioning for quick reverts

Easy to deploy and manage

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 22 / 48

Page 53: Scaling Infrastructure at Carousell

Configuration and Service Discovery

For Service Discovery we needed →

Decoupled from application code

Health checks

Easy to Scale Out

Easy to deploy and manage

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 23 / 48

Page 54: Scaling Infrastructure at Carousell

Configuration and Service Discovery

We built ’Config-Service’ on top on’Consul’

Configuration on nodes using ConsulTemplate & Envconsul

Installation on instances usinginternal Debian package and repo

’Config-Service’ package takes careof consul cluster configuration andhealth check registration

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 24 / 48

Page 55: Scaling Infrastructure at Carousell

Configuration and Service Discovery

We built ’Config-Service’ on top on’Consul’

Configuration on nodes using ConsulTemplate & Envconsul

Installation on instances usinginternal Debian package and repo

’Config-Service’ package takes careof consul cluster configuration andhealth check registration

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 24 / 48

Page 56: Scaling Infrastructure at Carousell

Configuration Management

Git repository to manageconfiguration

Filename is the key, content is thevalue

Single source of truth

Audit log of changes

Easy reverts and versioning (just usegit revert)

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 25 / 48

Page 57: Scaling Infrastructure at Carousell

Configuration Management

Git repository to manageconfiguration

Filename is the key, content is thevalue

Single source of truth

Audit log of changes

Easy reverts and versioning (just usegit revert)

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 25 / 48

Page 58: Scaling Infrastructure at Carousell

Service Discovery

Named discovery

Loose coupling

Auto failover

Load balancing

Auto scaling on CPU usage /Number of Requests

Node Maintenance

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 26 / 48

Page 59: Scaling Infrastructure at Carousell

Service Discovery

Named discovery

Loose coupling

Auto failover

Load balancing

Auto scaling on CPU usage /Number of Requests

Node Maintenance

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 26 / 48

Page 60: Scaling Infrastructure at Carousell

Service Discovery

Named discovery

Loose coupling

Auto failover

Load balancing

Auto scaling on CPU usage /Number of Requests

Node Maintenance

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 26 / 48

Page 61: Scaling Infrastructure at Carousell

Service Discovery

Named discovery

Loose coupling

Auto failover

Load balancing

Auto scaling on CPU usage /Number of Requests

Node Maintenance

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 26 / 48

Page 62: Scaling Infrastructure at Carousell

Config-Service Overview

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 27 / 48

Page 63: Scaling Infrastructure at Carousell

Auto Scaling

Pay as you go, lower cost

Better fault tolerance

Availability zone failures

Handle sudden increase in traffic (specially at midnight!)

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 28 / 48

Page 64: Scaling Infrastructure at Carousell

Auto Scaling

Pay as you go, lower cost

Better fault tolerance

Availability zone failures

Handle sudden increase in traffic (specially at midnight!)

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 28 / 48

Page 65: Scaling Infrastructure at Carousell

Key Take Away

Assume things willbreak

Set Convention

Script everything

Use deb/rpm packages

Instance groups forstateless services

More Cattle, less Pets

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 29 / 48

Page 66: Scaling Infrastructure at Carousell

Key Take Away

Assume things willbreak

Set Convention

Script everything

Use deb/rpm packages

Instance groups forstateless services

More Cattle, less Pets

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 29 / 48

Page 67: Scaling Infrastructure at Carousell

Key Take Away

Assume things willbreak

Set Convention

Script everything

Use deb/rpm packages

Instance groups forstateless services

More Cattle, less Pets

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 29 / 48

Page 68: Scaling Infrastructure at Carousell

Key Take Away

Assume things willbreak

Set Convention

Script everything

Use deb/rpm packages

Instance groups forstateless services

More Cattle, less Pets

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 29 / 48

Page 69: Scaling Infrastructure at Carousell

Key Take Away

Assume things willbreak

Set Convention

Script everything

Use deb/rpm packages

Instance groups forstateless services

More Cattle, less Pets

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 29 / 48

Page 70: Scaling Infrastructure at Carousell

Key Take Away

Assume things willbreak

Set Convention

Script everything

Use deb/rpm packages

Instance groups forstateless services

More Cattle, less Pets

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 29 / 48

Page 71: Scaling Infrastructure at Carousell

Kubernetes

Partial Kubernetes deployment sinceOct, 2016

Full Production deployment sinceNov, 2016

Using Google Container Engine

30+ deployments

500+ containers (At Peak)

Autoscale on CPU targets

Not all services on boarded yet

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 30 / 48

Page 72: Scaling Infrastructure at Carousell

Kubernetes

Partial Kubernetes deployment sinceOct, 2016

Full Production deployment sinceNov, 2016

Using Google Container Engine

30+ deployments

500+ containers (At Peak)

Autoscale on CPU targets

Not all services on boarded yet

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 30 / 48

Page 73: Scaling Infrastructure at Carousell

Kubernetes

Partial Kubernetes deployment sinceOct, 2016

Full Production deployment sinceNov, 2016

Using Google Container Engine

30+ deployments

500+ containers (At Peak)

Autoscale on CPU targets

Not all services on boarded yet

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 30 / 48

Page 74: Scaling Infrastructure at Carousell

Kubernetes

We don’t use K8S Ingress/Service

Config-Service (consul) asDaemonSet

Containers get registered onConfig-Service (NodePort) fromhealth check

No change in existing architectureneeded

Service discovery fromInternal/External HA Proxy stillworks

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 31 / 48

Page 75: Scaling Infrastructure at Carousell

Kubernetes

We don’t use K8S Ingress/Service

Config-Service (consul) asDaemonSet

Containers get registered onConfig-Service (NodePort) fromhealth check

No change in existing architectureneeded

Service discovery fromInternal/External HA Proxy stillworks

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 31 / 48

Page 76: Scaling Infrastructure at Carousell

Kubernetes

We don’t use K8S Ingress/Service

Config-Service (consul) asDaemonSet

Containers get registered onConfig-Service (NodePort) fromhealth check

No change in existing architectureneeded

Service discovery fromInternal/External HA Proxy stillworks

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 31 / 48

Page 77: Scaling Infrastructure at Carousell

Kubernetes

’Config-Service’ allows us to have hybrid model

Instance groups can coexist with Kubernetes

Recovery mechanism / Transitioning

Instance group size set to zero (Fully on K8S)

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 32 / 48

Page 78: Scaling Infrastructure at Carousell

Kubernetes

’Config-Service’ allows us to have hybrid model

Instance groups can coexist with Kubernetes

Recovery mechanism / Transitioning

Instance group size set to zero (Fully on K8S)

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 32 / 48

Page 79: Scaling Infrastructure at Carousell

Kubernetes

’Config-Service’ allows us to have hybrid model

Instance groups can coexist with Kubernetes

Recovery mechanism / Transitioning

Instance group size set to zero (Fully on K8S)

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 32 / 48

Page 80: Scaling Infrastructure at Carousell

Deployment Pipeline

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 33 / 48

Page 81: Scaling Infrastructure at Carousell

Deployment Pipeline

Jenkins Pipeline

Pipeline triggers jenkins jobs

3 Clicks to Deploy

Approval Steps

Jobs to pause, resume orrevert deployment

Tracked in Slack channels

Soon to be transformed toCI/CD

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 34 / 48

Page 82: Scaling Infrastructure at Carousell

Deployment Pipeline

Jenkins Pipeline

Pipeline triggers jenkins jobs

3 Clicks to Deploy

Approval Steps

Jobs to pause, resume orrevert deployment

Tracked in Slack channels

Soon to be transformed toCI/CD

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 34 / 48

Page 83: Scaling Infrastructure at Carousell

Deployment Pipeline

Jenkins Pipeline

Pipeline triggers jenkins jobs

3 Clicks to Deploy

Approval Steps

Jobs to pause, resume orrevert deployment

Tracked in Slack channels

Soon to be transformed toCI/CD

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 34 / 48

Page 84: Scaling Infrastructure at Carousell

Deployment Pipeline

Jenkins Pipeline

Pipeline triggers jenkins jobs

3 Clicks to Deploy

Approval Steps

Jobs to pause, resume orrevert deployment

Tracked in Slack channels

Soon to be transformed toCI/CD

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 34 / 48

Page 85: Scaling Infrastructure at Carousell

Deployment Pipeline

Jenkins Pipeline

Pipeline triggers jenkins jobs

3 Clicks to Deploy

Approval Steps

Jobs to pause, resume orrevert deployment

Tracked in Slack channels

Soon to be transformed toCI/CD

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 34 / 48

Page 86: Scaling Infrastructure at Carousell

Deployment Pipeline

Jenkins Pipeline

Pipeline triggers jenkins jobs

3 Clicks to Deploy

Approval Steps

Jobs to pause, resume orrevert deployment

Tracked in Slack channels

Soon to be transformed toCI/CD

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 34 / 48

Page 87: Scaling Infrastructure at Carousell

Deployment Pipeline

Jenkins Pipeline

Pipeline triggers jenkins jobs

3 Clicks to Deploy

Approval Steps

Jobs to pause, resume orrevert deployment

Tracked in Slack channels

Soon to be transformed toCI/CD

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 34 / 48

Page 88: Scaling Infrastructure at Carousell

Monitoring & Alerting

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 35 / 48

Page 89: Scaling Infrastructure at Carousell

Monitoring & Alerting

Monitoring is critical

Know your Infrastructure

Capture everything, always

Use Proper tools

Prometheus (withexporters)ELKSentryStatsDNewRelicOpsGeniePingdom

Identify Retention

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 36 / 48

Page 90: Scaling Infrastructure at Carousell

Monitoring & Alerting

Monitoring is critical

Know your Infrastructure

Capture everything, always

Use Proper tools

Prometheus (withexporters)ELKSentryStatsDNewRelicOpsGeniePingdom

Identify Retention

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 36 / 48

Page 91: Scaling Infrastructure at Carousell

Monitoring & Alerting

Monitoring is critical

Know your Infrastructure

Capture everything, always

Use Proper tools

Prometheus (withexporters)ELKSentryStatsDNewRelicOpsGeniePingdom

Identify Retention

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 36 / 48

Page 92: Scaling Infrastructure at Carousell

Monitoring & Alerting

Monitoring is critical

Know your Infrastructure

Capture everything, always

Use Proper tools

Prometheus (withexporters)ELKSentryStatsDNewRelicOpsGeniePingdom

Identify Retention

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 36 / 48

Page 93: Scaling Infrastructure at Carousell

Monitoring & Alerting

Monitoring is critical

Know your Infrastructure

Capture everything, always

Use Proper tools

Prometheus (withexporters)ELKSentryStatsDNewRelicOpsGeniePingdom

Identify Retention

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 36 / 48

Page 94: Scaling Infrastructure at Carousell

Monitoring & Alerting

Bare minimum required metrics→

Load Average

CPU percent

Memory Available

Network Bandwidth

Network Connections

Disk IOPS

Disk Usage

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 37 / 48

Page 95: Scaling Infrastructure at Carousell

Monitoring & Alerting

Bare minimum required metrics→

Load Average

CPU percent

Memory Available

Network Bandwidth

Network Connections

Disk IOPS

Disk Usage

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 37 / 48

Page 96: Scaling Infrastructure at Carousell

Monitoring & Alerting

Bare minimum required metrics→

Load Average

CPU percent

Memory Available

Network Bandwidth

Network Connections

Disk IOPS

Disk Usage

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 37 / 48

Page 97: Scaling Infrastructure at Carousell

Monitoring & Alerting

Bare minimum required metrics→

Load Average

CPU percent

Memory Available

Network Bandwidth

Network Connections

Disk IOPS

Disk Usage

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 37 / 48

Page 98: Scaling Infrastructure at Carousell

Build Dashboards

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 38 / 48

Page 99: Scaling Infrastructure at Carousell

Build Dashboards

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 38 / 48

Page 100: Scaling Infrastructure at Carousell

Build Dashboards

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 38 / 48

Page 101: Scaling Infrastructure at Carousell

Build Dashboards

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 38 / 48

Page 102: Scaling Infrastructure at Carousell

Monitoring & Alerting

’Config-Service’ logs autofailover

Slack for notifications

On Call

Avoid alert blindness

Keep links handy

Schedule jobs

Automate

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 39 / 48

Page 103: Scaling Infrastructure at Carousell

Monitoring & Alerting

’Config-Service’ logs autofailover

Slack for notifications

On Call

Avoid alert blindness

Keep links handy

Schedule jobs

Automate

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 39 / 48

Page 104: Scaling Infrastructure at Carousell

Monitoring & Alerting

’Config-Service’ logs autofailover

Slack for notifications

On Call

Avoid alert blindness

Keep links handy

Schedule jobs

Automate

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 39 / 48

Page 105: Scaling Infrastructure at Carousell

Monitoring & Alerting

’Config-Service’ logs autofailover

Slack for notifications

On Call

Avoid alert blindness

Keep links handy

Schedule jobs

Automate

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 39 / 48

Page 106: Scaling Infrastructure at Carousell

Monitoring & Alerting

’Config-Service’ logs autofailover

Slack for notifications

On Call

Avoid alert blindness

Keep links handy

Schedule jobs

Automate

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 39 / 48

Page 107: Scaling Infrastructure at Carousell

Monitoring & Alerting

’Config-Service’ logs autofailover

Slack for notifications

On Call

Avoid alert blindness

Keep links handy

Schedule jobs

Automate

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 39 / 48

Page 108: Scaling Infrastructure at Carousell

Monitoring & Alerting

’Config-Service’ logs autofailover

Slack for notifications

On Call

Avoid alert blindness

Keep links handy

Schedule jobs

Automate

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 39 / 48

Page 109: Scaling Infrastructure at Carousell

Future Plans

Hire more engineers!

Move more services to Kubernetes

Move away from PG (don’t need ACID)

Transition to Microservices

Improve monitoring further

More fault tolerance

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 40 / 48

Page 110: Scaling Infrastructure at Carousell

Future Plans

Hire more engineers!

Move more services to Kubernetes

Move away from PG (don’t need ACID)

Transition to Microservices

Improve monitoring further

More fault tolerance

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 40 / 48

Page 111: Scaling Infrastructure at Carousell

Future Plans

Hire more engineers!

Move more services to Kubernetes

Move away from PG (don’t need ACID)

Transition to Microservices

Improve monitoring further

More fault tolerance

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 40 / 48

Page 112: Scaling Infrastructure at Carousell

Future Plans

Hire more engineers!

Move more services to Kubernetes

Move away from PG (don’t need ACID)

Transition to Microservices

Improve monitoring further

More fault tolerance

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 40 / 48

Page 113: Scaling Infrastructure at Carousell

Future Plans

Hire more engineers!

Move more services to Kubernetes

Move away from PG (don’t need ACID)

Transition to Microservices

Improve monitoring further

More fault tolerance

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 40 / 48

Page 114: Scaling Infrastructure at Carousell

Future Plans

Hire more engineers!

Move more services to Kubernetes

Move away from PG (don’t need ACID)

Transition to Microservices

Improve monitoring further

More fault tolerance

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 40 / 48

Page 115: Scaling Infrastructure at Carousell

Microservices

Golang (go-kit inspired)

Cassandra for storage

ElasticSearch for lookup

gRPC for communication

Hystrix for real timemonitoring

Zipkin for request tracing

Prometheus for metrics

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 41 / 48

Page 116: Scaling Infrastructure at Carousell

Microservices

Golang (go-kit inspired)

Cassandra for storage

ElasticSearch for lookup

gRPC for communication

Hystrix for real timemonitoring

Zipkin for request tracing

Prometheus for metrics

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 41 / 48

Page 117: Scaling Infrastructure at Carousell

Microservices

Golang (go-kit inspired)

Cassandra for storage

ElasticSearch for lookup

gRPC for communication

Hystrix for real timemonitoring

Zipkin for request tracing

Prometheus for metrics

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 41 / 48

Page 118: Scaling Infrastructure at Carousell

Microservices

Golang (go-kit inspired)

Cassandra for storage

ElasticSearch for lookup

gRPC for communication

Hystrix for real timemonitoring

Zipkin for request tracing

Prometheus for metrics

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 41 / 48

Page 119: Scaling Infrastructure at Carousell

Microservices

Golang (go-kit inspired)

Cassandra for storage

ElasticSearch for lookup

gRPC for communication

Hystrix for real timemonitoring

Zipkin for request tracing

Prometheus for metrics

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 41 / 48

Page 120: Scaling Infrastructure at Carousell

Microservices

Golang (go-kit inspired)

Cassandra for storage

ElasticSearch for lookup

gRPC for communication

Hystrix for real timemonitoring

Zipkin for request tracing

Prometheus for metrics

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 41 / 48

Page 121: Scaling Infrastructure at Carousell

Microservices

Golang (go-kit inspired)

Cassandra for storage

ElasticSearch for lookup

gRPC for communication

Hystrix for real timemonitoring

Zipkin for request tracing

Prometheus for metrics

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 41 / 48

Page 122: Scaling Infrastructure at Carousell

Flash Sale

Ultimate test of scalability

Hard to judge peak

Throughput can multiply inshort time

Planned for 2x throughput

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 42 / 48

Page 123: Scaling Infrastructure at Carousell

Flash Sale

Ultimate test of scalability

Hard to judge peak

Throughput can multiply inshort time

Planned for 2x throughput

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 42 / 48

Page 124: Scaling Infrastructure at Carousell

Flash Sale - Latency

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 43 / 48

Page 125: Scaling Infrastructure at Carousell

Flash Sale

Cache read calls at multiple layers

Upsized ES nodes, Eventuallyreplacing entire cluster

Local SSD PG slaves with RAID 0(100k IOPS)

Identify network bottlenecks

Recheck ulimit and connection limits

Build and keep SOP handy

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 44 / 48

Page 126: Scaling Infrastructure at Carousell

Flash Sale

Cache read calls at multiple layers

Upsized ES nodes, Eventuallyreplacing entire cluster

Local SSD PG slaves with RAID 0(100k IOPS)

Identify network bottlenecks

Recheck ulimit and connection limits

Build and keep SOP handy

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 44 / 48

Page 127: Scaling Infrastructure at Carousell

Flash Sale

Cache read calls at multiple layers

Upsized ES nodes, Eventuallyreplacing entire cluster

Local SSD PG slaves with RAID 0(100k IOPS)

Identify network bottlenecks

Recheck ulimit and connection limits

Build and keep SOP handy

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 44 / 48

Page 128: Scaling Infrastructure at Carousell

Flash Sale

Cache read calls at multiple layers

Upsized ES nodes, Eventuallyreplacing entire cluster

Local SSD PG slaves with RAID 0(100k IOPS)

Identify network bottlenecks

Recheck ulimit and connection limits

Build and keep SOP handy

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 44 / 48

Page 129: Scaling Infrastructure at Carousell

Flash Sale

Cache read calls at multiple layers

Upsized ES nodes, Eventuallyreplacing entire cluster

Local SSD PG slaves with RAID 0(100k IOPS)

Identify network bottlenecks

Recheck ulimit and connection limits

Build and keep SOP handy

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 44 / 48

Page 130: Scaling Infrastructure at Carousell

Flash Sale - Standard Operating Procedure

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 45 / 48

Page 131: Scaling Infrastructure at Carousell

Infrastructure Team at Carousell

400+ servers

Thousands of requests per second

Production Issues get looked after in < 5 Mins

Uptime of 99.95

Failures don’t result in outages

All thanks to Planning, Monitoring and Automation

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 46 / 48

Page 132: Scaling Infrastructure at Carousell

Take Away

Isolate stateful and stateless components

Isolating compute is equally important

Choose data stores carefully, you won’t be changing themfrequently

Use Abstractions only after understating them

Perform Root Cause Analysis not just workarounds/isolations

Identify bottlenecks

Monitor everything

Blame CODE not CODER

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 47 / 48

Page 133: Scaling Infrastructure at Carousell

Take Away

Isolate stateful and stateless components

Isolating compute is equally important

Choose data stores carefully, you won’t be changing themfrequently

Use Abstractions only after understating them

Perform Root Cause Analysis not just workarounds/isolations

Identify bottlenecks

Monitor everything

Blame CODE not CODER

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 47 / 48

Page 134: Scaling Infrastructure at Carousell

Take Away

Isolate stateful and stateless components

Isolating compute is equally important

Choose data stores carefully, you won’t be changing themfrequently

Use Abstractions only after understating them

Perform Root Cause Analysis not just workarounds/isolations

Identify bottlenecks

Monitor everything

Blame CODE not CODER

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 47 / 48

Page 135: Scaling Infrastructure at Carousell

Take Away

Isolate stateful and stateless components

Isolating compute is equally important

Choose data stores carefully, you won’t be changing themfrequently

Use Abstractions only after understating them

Perform Root Cause Analysis not just workarounds/isolations

Identify bottlenecks

Monitor everything

Blame CODE not CODER

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 47 / 48

Page 136: Scaling Infrastructure at Carousell

Take Away

Isolate stateful and stateless components

Isolating compute is equally important

Choose data stores carefully, you won’t be changing themfrequently

Use Abstractions only after understating them

Perform Root Cause Analysis not just workarounds/isolations

Identify bottlenecks

Monitor everything

Blame CODE not CODER

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 47 / 48

Page 137: Scaling Infrastructure at Carousell

Take Away

Isolate stateful and stateless components

Isolating compute is equally important

Choose data stores carefully, you won’t be changing themfrequently

Use Abstractions only after understating them

Perform Root Cause Analysis not just workarounds/isolations

Identify bottlenecks

Monitor everything

Blame CODE not CODER

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 47 / 48

Page 138: Scaling Infrastructure at Carousell

Take Away

Isolate stateful and stateless components

Isolating compute is equally important

Choose data stores carefully, you won’t be changing themfrequently

Use Abstractions only after understating them

Perform Root Cause Analysis not just workarounds/isolations

Identify bottlenecks

Monitor everything

Blame CODE not CODER

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 47 / 48

Page 139: Scaling Infrastructure at Carousell

Take Away

Isolate stateful and stateless components

Isolating compute is equally important

Choose data stores carefully, you won’t be changing themfrequently

Use Abstractions only after understating them

Perform Root Cause Analysis not just workarounds/isolations

Identify bottlenecks

Monitor everything

Blame CODE not CODER

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 47 / 48

Page 140: Scaling Infrastructure at Carousell

Thank You

Q&A

P.S. we are hiring http://careers.carousell.com/

Harshad Rotithor & Ankur Shrivastava Scaling Infrastructure at Carousell January 12, 2017 48 / 48