Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi -...

41
Kubernetes and lastminute.com group: our course towards better scalability and processes [email protected] Milan, 25-26 November 2016

Transcript of Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi -...

Page 1: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Kubernetes and lastminute.com group: our course towards better scalability and processes

[email protected]

Milan, 25-26 November 2016

Page 2: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

The inspiring travel company

Page 3: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

lastminute.com group in numbers

40 countries17 languages

10Mtravellers per year*

€ 2.5B GTV*€ 250M revenue*

43M users per month*

*data as 31st December 2015icons from http://www.flaticon.com

Page 4: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

A tech company to the core

Tech department: 300+ people

Modules: ~100

Database: 150 schemas, 3300 tables, TB data

Instances: 1400+

Locations: Chiasso, Milan, Madrid, London, Bengaluru

Page 5: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

https://www.pexels.com/photo/turtle-walking-on-sand-132936/

“Business thinks developers are slow"

Page 6: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

lastminute.com group: an agile company

● Scrum and Kanban● TDD● clean code● continuous integration● code review● internal communities

Page 7: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Starting from the monolith ...

https://www.flickr.com/photos/southtopia/5702790189

Page 8: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

https://www.pexels.com/photo/gray-pebbles-with-green-grass-51168/

... broken into microservices

Page 9: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

The improvements needed

● alignment

● real pipelines

● infrastructure

● resilience

● monitoring

● remove constraints

Page 10: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

An year-long endeavour

● build a new, modern infrastructure

● migrate the search (flight/hotel) product there

... without:

● impacting the business● throwing away our whole datacenter

Page 11: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

TODO list

● company framework

● docker

● kubernetes

Page 12: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

How? Teams and peopleNew teams

https://www.pexels.com/photo/blue-lego-toy-beside-orange-and-white-lego-toy-standing-during-daytime-105822/

Page 13: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Our infrastructure and technologyOur infrastructure and technology

https://www.pexels.com/photo/colorful-toothed-wheels-171198/

Page 14: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

● build once, run everywhere

● externalised configuration

Docker containers

Page 15: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Docker containers

registry.intra/application:v2-090025112016

BASE OS

JAVA SDK

START/STOP SCRIPTS

JAR APPLICATION

● build once, run everywhere

● externalised configuration

Page 16: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Kubernetes

● independent from OS/hosts

● isolated env, managed at scale

● self-healing

● externalised configuration

Omega paper: http://research.google.com/pubs/pub41684.html

Page 17: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

https://www.pexels.com/photo/red-toy-truck-24619/

“Your infrastructure on wheels”

Page 18: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Kubernetes: physical representation

NODE1

DOCKER

ETCD

K8S

cluster

FLANNEL

NODE2

DOCKER

ETCD

K8S

FLANNEL

NODE28

DOCKER

ETCD

K8S

FLANNEL

...

Page 19: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Kubernetes: logical representation

NAMESPACE1 CPU 10MEM 40GB

NAMESPACE2 CPU 20MEM 50GB

NAMESPACE3 CPU 80MEM 60GB

NAMESPACE4 CPU 5MEM 5GB

cluster

Page 20: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

APP3-PRODUCTION

Kubernetes: our architecture

APP2-PRODUCTIONAPP1-PRODUCTION

APP3-PRODUCTIONAPP2-PRODUCTION

APP1-PREVIEW

APP3-PRODUCTIONAPP2-PRODUCTION

APP1-DEVELOPMENT

APP3-PRODUCTIONAPP2-PRODUCTION

APP1-QA

APP3-PRODUCTIONAPP2-PRODUCTION

APP1-STRESSTEST

nonproductionproduction

Page 21: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Kubernetes: our architecture and choices

APP1-PRODUCTION

deployment

replica-set

POD3

POD2

POD1

production

Page 22: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Kubernetes: our architecture and choices

APP1-PRODUCTION

deployment

replica-set

secret configmap

POD3

POD2

POD1

production

Page 23: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Kubernetes: our architecture and choices

APP1-PRODUCTION

deployment

replica-set

app1.lastminute.intra

secret configmap

POD3

POD2

POD1

loadbalancer-app1

production

Page 24: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

APP1-PRODUCTION

Kubernetes: our architecture and choices

POD

collectd

production

application fluentd

Page 25: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Kubernetes: what’s left outside?

● datastores

● distributed caches

● distributed locking

● pub-sub

● logs and metrics storage

Page 26: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

1st try (with test app), it seemed to work

https://www.flickr.com/photos/26516072@N00/2194001232

Page 27: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

The self-healing term describes any application, service, or a system that can discover that it is not working correctly and, without any human intervention, make the necessary changes to restore itself to the normal or designed state.

Self-healing

ref: https://technologyconversations.com/2016/01/26/self-healing-systems

Page 28: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Kubernetes agnostic interfaces

“When a container is dead I will restart it”

“When a container is ready I will forward traffic to it”

Page 29: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Kubernetes probes: liveness & readiness

Two questions for dev:

● when can I consider my container alive?

● when can I consider my container ready to receive traffic?

spec: containers: livenessProbe: httpGet: path: /liveness successThreshold: 3 failureThreshold: 2

readinessProbe: httpGet: path: /readiness successThreshold: 3 failureThreshold: 2

deployment.yaml

Page 30: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

/liveness:

● when tomcat container is up● when ratio “active/max” threads are lower than a

threshold

/readiness:

● all the startup jobs have run● no termination request has been received

.. ongoing never-ending research ..

Our choices: framework - k8s

Page 31: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

● zero downtime during rollout

● monitoring in place

● alerting

● centralized logging

● legacy infrastructure to the rescue in case of problem

2nd try (with production traffic)

Page 32: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

... failure ... the big one!

https://www.flickr.com/photos/ghost_of_kuji/2763674926

Page 33: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Problems

● configuration

● infrastructure

● tools

● manual mistakes

● (external) scalability

Page 34: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

● temporary team focus on objective

● automation

● monitoring

● Go deeper in docker/kubernetes

Another improvement step

Page 35: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Pipeline: a huge step forward

microservice = factory.newDeployRequest().withArtifact(“com.lastminute.application1”,2)

lmn_deployCanaryStrategy(microservice,”qa”)

lmn_deployStableStrategy(microservice,”preview”)

lmn_deployCanaryStrategy(microservice,”production”)

pipeline

Page 36: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

APP1-PRODUCTION

POD

Monitoring: grafana/graphite/nagios

cluster

graphiteapplication collectd

Grafana

nagios

icons from http://www.flaticon.com

Page 37: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

“Go” deep .. whatever language it takes

https://www.pexels.com/photo/sea-man-person-ocean-2859/

Page 38: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

There’s light .. There’s a light .. at the end

https://www.pexels.com/photo/grayscale-photography-of-person-at-the-end-of-tunnel-211816/

Page 39: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

● lead and migration time

● resilience

● root cause analysis

● speed of deployment

● instant scaling

... benefits

Page 40: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

● 1300 req/sec in the new cluster● 25 micro-services migrated in 4 months● 1 week to migrate an application● 10 minutes to create a new environment ● 11 min to gracefully roll-out a new version with 55

instances● whole pipeline runs in 16 min● 1.5M metrics/minute flows

Give me the numbers!

Page 41: Kubernetes and lastminute.com: our course towards better scalability and processes - Michele Orsi - Codemotion Milan 2016

Yes, we’re hiring!

THANKS

www.lastminutegroup.com