Reconstructing the SRE

50
Reconstructing the SRE Bob Wise CTO Cloud Native Computing Team

Transcript of Reconstructing the SRE

Page 1: Reconstructing the SRE

ReconstructingtheSRE

BobWiseCTO

CloudNativeComputingTeam

Page 2: Reconstructing the SRE

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved

This presentation is intended to provide information concerning Samsung’s efforts around containers and container orchestration. We do our best to make sure that information presented is accurate and fully up-to-date. However, the presentation may be subject to technical inaccuracies, information that is not up-to-date or typographical errors. As a consequence, Samsung does not in any way guarantee the accuracy or completeness of information provided on this presentation. Samsung reserves the right to make improvements, corrections and/or changes to this presentation at any time.

The information in this presentation or accompanying oral statements may include forward-looking statements. These forward-looking statements include all matters that are not historical facts, statements regarding the Samsung Data System' intentions, beliefs or current expectations concerning, among other things, market prospects, growth, strategies, and the industry in which Samsung operates. By their nature, forward-looking statements involve risks and uncertainties, because they relate to events and depend on circumstances that may or may not occur in the future. Samsung cautions you that forward looking statements are not guarantees of future performance and that the actual developments of Samsung, the market, or industry in which Samsung operates may differ materially from those made or suggested by the forward-looking statements contained in this presentation or in the accompanying oral statements. In addition, even if the information contained herein or the oral statements are shown to be accurate, those developments may not be indicative developments in future periods.

Logos remain the property of their respective owners.

2

Page 3: Reconstructing the SRE

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved

This presentation is intended to provide information concerning Samsung’s efforts around containers and container orchestration. We do our best to make sure that information presented is accurate and fully up-to-date. However, the presentation may be subject to technical inaccuracies, information that is not up-to-date or typographical errors. As a consequence, Samsung does not in any way guarantee the accuracy or completeness of information provided on this presentation. Samsung reserves the right to make improvements, corrections and/or changes to this presentation at any time.

The information in this presentation or accompanying oral statements may include forward-looking statements. These forward-looking statements include all matters that are not historical facts, statements regarding the Samsung Data System' intentions, beliefs or current expectations concerning, among other things, market prospects, growth, strategies, and the industry in which Samsung operates. By their nature, forward-looking statements involve risks and uncertainties, because they relate to events and depend on circumstances that may or may not occur in the future. Samsung cautions you that forward looking statements are not guarantees of future performance and that the actual developments of Samsung, the market, or industry in which Samsung operates may differ materially from those made or suggested by the forward-looking statements contained in this presentation or in the accompanying oral statements. In addition, even if the information contained herein or the oral statements are shown to be accurate, those developments may not be indicative developments in future periods.

Logos remain the property of their respective owners. So there.

3

ReleasetheKraken.

Page 4: Reconstructing the SRE

SDS- CloudNativeComputingTeam

• Our#1jobisimprovingorganizationalvelocity– DeliveringthebusinessvalueofKubernetestoyou,fastest

• RocksolidKubernetesclusterdesignanddeployment,specifictoyou• Optimizeddeploymentpipelinesandcontainerstrategy• 24x7x365Kubernetesoperationssoyoucanfocusonyourbusiness• Organizationalconsultingtorapidlyadapt

• Weare:– IndustryleadersinOperationsAutomation,ClusterOperations,andKubernetesAdoption

– ContributorsandleadersontheKubernetesprojectfor2+years– MaintainersofKraken:production-gradeclustermanagement– DeliveringthisforEnterprisecustomersglobally

Copyright©2017SamsungSDSCo.,Ltd.Allrightsreserved4

Page 5: Reconstructing the SRE

MacroTrends

1. MacroTrend:Massiveshiftsinallindustriestowardssophisticatedandcomprehensiveautomationtoenablecompetitiveadvantage

2. Inalltechnologydelivery“CloudNative”architecturesandautomationprevail

3. Outsourcingistheonlypathascompaniescannotadaptandacquireexpertiseinternally

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved 5

Page 6: Reconstructing the SRE

RootCauses:DeepOrganizationalPain• Toolsandprocessesbuiltfortheeraofbaremetalandvirtualmachines

• Devteamsunderpressuretodelivernewfeaturesquickly,notgoingfastenough

• Operationsteamstryingtosupporthastilydeployedfeatures– Qualityissues– Outages– Constantfirefighting– Unhappycustomers– Employeeretentionissues

• Operationsviewedas“justtryingtoblockthings”• “Devops”hascometomeangeneralistswhoareoverwhelmedtryingtohandleeverything

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved6

Page 7: Reconstructing the SRE

TheCloudNativeDisruption

7

Page 8: Reconstructing the SRE

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved 8

CloudNativeHasDramaticallyRaisedtheBar

Page 9: Reconstructing the SRE

ClassicEnterpriseITisEspeciallyBehind

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved 9

Fromthehighlyrecommended:https://www.slideshare.net/adriancockcroft/dockercon-state-of-the-art-in-microservices

Page 10: Reconstructing the SRE

CloudNativeMarketRealities

Enormousdifferencesbetweencompaniesintheabilitytoexecuteonsoftwareproductdeliveryhaveemergedonallthreeaxes:

• Velocity• Quality• Efficiency

Copyright©2017 SamsungSDSCo.,Ltd.Allrightsreserved2

Page 11: Reconstructing the SRE

CloudNativeMarketRealities

Thecompanieswinninginallthreeofthesecategoriesshareonthingincommon:

• Velocity• Quality• Efficiency

Copyright©2017 SamsungSDSCo.,Ltd.Allrightsreserved2

Theme:UsingCloudNativeApproaches

Page 12: Reconstructing the SRE

(a) Containerized. Applicationsdeployedinunitsthatcanbeeasilymanagedanddealtwithbyeveryone:developers,productmanagers,andoperations.

(b) Dynamicallymanaged. Automaticallyandresponsivelydeployedbyanorchestrationenginethatconsiderscustomerexperienceandcost.

(c) Micro-servicesoriented. Looselycoupledandrapidlyadaptableservicesthatcanbeinnovatedanddeployedseparatelywhilethesystemasawholecontinuestooperate.

Copyright©2017 SamsungSDSCo.,Ltd.Allrightsreserved12

WhatisCloudNative?

Page 13: Reconstructing the SRE

CloudNativeOrganizationsAreHighPerformanceBasedOn:

HowTheyBehaveHowTheyMeasure

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved13

Page 14: Reconstructing the SRE

HighPerformanceOrgsEmbraceChange

14

Page 15: Reconstructing the SRE

HighPerformanceOrgsEmbraceRapidChangeWithAutomation

15

Page 16: Reconstructing the SRE

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved16

PuppetLabs2017 StateofDevops ReportHighlyRecommendedReading

https://puppet.com/resources/whitepaper/2017-state-of-devops-report

Page 17: Reconstructing the SRE

MonolithicComplexDependenciesLarge,CentrallyControlledTeamsMonthstoProductionWaterfallProcessManualQAManualSecurityAudits

LotsofSimplePartsIndependentPartsSmall,IndependentTeamsSpecialistsnotGeneralistsSREPatternContinuousIntegrationContinuousDeploymentAutomatedQAUbiquitousAutomationAutomatedSecurity

ChangeResistant ChangeEmbracing

EmbraceTheChange

Page 18: Reconstructing the SRE

MonolithicComplexDependenciesLarge,CentrallyControlledTeamsMonthstoProductionWaterfallProcessManualQAManualSecurityAudits

LotsofSimplePartsIndependentPartsSmall,IndependentTeamsContinuousIntegrationContinuousDeploymentAutomatedQAUbiquitousAutomationAutomatedSecurity

ChangeResistant ChangeEmbracing

KeyInsight

OptimizedforMTBF OptimizedforMTTR

Page 19: Reconstructing the SRE

MonolithicComplexDependenciesLarge,CentrallyControlledTeamsMonthstoProductionWaterfallProcessManualQAManualSecurityAudits

LotsofSimplePartsIndependentPartsSmall,IndependentTeamsContinuousIntegrationContinuousDeploymentAutomatedQAUbiquitousAutomationAutomatedSecurity

ChangeResistant ChangeEmbracing

KeyInsight

MeasuredbyMTBF MeasuredbyMTTR

MeanTimeBetweenFailuredrivesfailurepreventionandriskaversionMeanTimeToRepairdrivesresponsivenessandallowsrisks

Page 20: Reconstructing the SRE

MonolithicComplexDependenciesLarge,CentrallyControlledTeamsMonthstoProductionWaterfallProcessManualQAManualSecurityAudits

LotsofSimplePartsIndependentPartsSmall,IndependentTeamsSpecialistsnotGeneralistsSREPatternContinuousIntegrationContinuousDeploymentAutomatedQAUbiquitousAutomationAutomatedSecurity

ChangeResistant ChangeEmbracing

EmbraceTheChange

Page 21: Reconstructing the SRE

LotsofSimplePartsIndependentParts

Small,IndependentTeamsSpecialists,notGeneralistsAdoptionoftheSREPattern

ContinuousIntegrationContinuousDeploymentAutomatedQAUbiquitousAutomationAutomatedSecurity

Architecture(Microservices)

Org

ExecutionFundamentals

Page 22: Reconstructing the SRE

LotsofSimplePartsIndependentParts

Small,IndependentTeamsSpecialists,notGeneralistsAdoptionoftheSREPattern

ContinuousIntegrationContinuousDeploymentAutomatedQAUbiquitousAutomationAutomatedSecurity

Architecture(Microservices)

Org

ExecutionFundamentals

CloudNative

Page 23: Reconstructing the SRE

Devops Positives• Breakingdownsilos• Nomore“throwingitoverthewall”• Orientationtoinfrastructureascode• Orientationtoautomation• ValuingcodingskillsinoperationsDevops fails:

– Companyrenamesopstodevops.– Improvementfail

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved23

Page 24: Reconstructing the SRE

Devops Anti-PatternsinLargerOrgs• Renameopstodevops.• Disbandops,“wejustmakethedevs doit”• Addadevops groupwithyetanothersilobetweendevandops• Expecteveryonetoknoweverything

– “Weonlyhavefullstackdevops”– Thisisunrealistic

Instead,wewantapatternforspecializationwithcollaboration….

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved24

Page 25: Reconstructing the SRE

TheSRE(SiteReliabilityEngineer)• Helpsproductteamsengineerforoperability

– Architecture– Tooling

• Ownscapacityplanning• EnsuresCItoolingisworkingandadequate• Helpsdevelopmentgofasterwithpositiveassistance• Spendsalotoftimeondevelopingtooling,typicallymonitoring,CI,andoperationalanalytics

• EnsureproductmanagementsetsSLOs(servicelevelobjectives)anderrorbudgets– Tracksandenforces

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved25

Page 26: Reconstructing the SRE

BenTraynor – SREVPatGoogle“OutsideGoogle,weoftenobservethatthereisn'tparityofesteembetweentheSWEandoperationsteams,whichcombinespoorlywiththefactthattheyoftenhavedifferentincentives.That'showweendupwiththemodelthatexistsintheindustrytoday,whereSWEteamswritesomethingandthrowitoverawalltotheoperationsteams,whothentrytomakeitwork,andcan't,andthrowitback,andsoon.”

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved26

Page 27: Reconstructing the SRE

BenTraynor – SREVPatGoogle“OutsideGoogle,weoftenobservethatthereisn'tparityofesteembetweentheSWEandoperationsteams,whichcombinespoorlywiththefactthattheyoftenhavedifferentincentives.That'showweendupwiththemodelthatexistsintheindustrytoday,whereSWEteamswritesomethingandthrowitoverawalltotheoperationsteams,whothentrytomakeitwork,andcan't,andthrowitback,andsoon.”

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved27

Page 28: Reconstructing the SRE

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved28

Dev

Ops

QACI

CD Monitoring

SRE

Page 29: Reconstructing the SRE

EvolvingfromDevops:ClusterOps

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved29

InfraOps

ClusterOps

Dev

Cluster

Infra

AppOpsApp App

ProductMgt

InfraOpsInfraInfra

Page 30: Reconstructing the SRE

ClusterOpsandtheSRERole

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved30

InfraOps

ClusterOps

Dev

Cluster

Infra

AppOpsApp App

ProductMgt

InfraOpsInfraInfra

SRE

Page 31: Reconstructing the SRE

TechnologyProgressiontoPlatforms…DeconstructionNeeded

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved31

Impacttothedevelopment• OperationsmovingupthestackwhileDevelopmentismovingdownthestack• StillalotcomplexityandseparationbetweenOperations/Developmentteams• Outsourcewheneconomiesofscalemakeitmorebeneficialtopushtoa3rd party

BareMetal

OS

OS/VirtualMachines

Application

Hypervisor

OS/VirtualMachines

Application

Dependencies Dependencies

BareMetal

OS

Dependencies

Application

InfrastructureInfrastructure

IAAS

Application Application

Dependencies Dependencies

PAAS

VM’s IAAS/PAASCo-Lo

BareMetal

OS

Dependencies

Application

Infrastructure

DIY

OS/VirtualMachines

OS/VirtualMachines

3rd Party

Operations

Development

AreaofResponsibility

Page 32: Reconstructing the SRE

OrganizationalTrendtoOutsourcing…DeconstructionNeeded

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved32

ImprovementstoDevelopmentProcess.• Fasterreleasecyclesmeansquickertomarket.• Smaller,morereliabledeployments.• Quickerrecovery,reduceriskwhenissuesoccur.Shiftinfocusto

MTTRvsMTTF.• Economiesofscalearegettingtothebuildvsbuydecision

InfrastructureOperations

ClusterOperations

AppOperations

InfrastructureOperations

ClusterOperations

AppOperations

InfrastructureOperations

ClusterOperations

AppOperations

IAAS

Application Application

Dependencies Dependencies

PAAS

IAAS/PAAS

OS/VirtualMachines

OS/VirtualMachines

FutureState

KubernetesContainersPipelineAlerting,MonitoringPerformance/EfficiencyMaintenanceReporting

SpeedtoMarket

ImprovementstoOrganizationalVelocity.• SubjectmatterexpertsonDay1• Noneedtospinupanewoperationsteam• AllowdevelopmentteamstofocusonProductvsTooling• Morecostefficientthandoingitin-house

Page 33: Reconstructing the SRE

MultipleFactors• TechnologyprogressiontoPlatforms• OrganizationalTrendtoOutsourcing

– SomeSREfunctionswillbeoutsideyourorg• Bestpractices:

– SpecialistsoverGeneralists– CollaborationoverSilos

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved33

Page 34: Reconstructing the SRE

ReconstructingtheSRE

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved34

SRE

AREApplication

CRECluster

IREInfra

AREConcerns• ApplicationSLI/SLO• Availabilitytargets• Podautoscaling• Capacityplanning• Deployments/Canaries• ApplicationCI/CD• Apppackagingpractices• AppSLIcapture• Applicationarchitecture• Applicationmonitoring• Applicationlogging

SLO– ServiceLevelObjectiveSLI– ServiceLevelIndicator

CRE/IREConcerns• ClusterSLI/SLO• ClusterCapacity• Nodeautoscaling• Controlplanedesign• Clusterupgrades• Clusterutilization• Under/overcapacity

nodes• Nodespecs• Dockerversion• Network

configuration• Namespacedesign• Failuredomains• Clustermonitoring• Clusterlogging

Page 35: Reconstructing the SRE

KubernetesFactors

Copyright©2016SamsungSDSCo.,Ltd.Allrightsreserved35

Page 36: Reconstructing the SRE

KubernetesConceptual

Copyright©2016SamsungSDSCo.,Ltd.Allrightsreserved36

Node(Server)

DockerDaemon

KubernetesNodeController

Node(Server)

DockerDaemon

KubernetesNodeController

Node(Server)

DockerDaemon

KubernetesNodeController

KubernetesControlPlane

Status:Cluster isreadyforwork

Page 37: Reconstructing the SRE

KubernetesControlPlane

KubernetesConceptual

Copyright©2016SamsungSDSCo.,Ltd.Allrightsreserved37

ToAPI: Run (1)ofcontainerX

Node(Server)

DockerDaemon

KubernetesNodeController

Node(Server)

DockerDaemon

KubernetesNodeController

Node(Server)

DockerDaemon

KubernetesNodeController

1ofX

ReplicaSet (replicationset)Managespodcount

Page 38: Reconstructing the SRE

Node(Server)

DockerDaemon

KubernetesNodeController

Node(Server)

DockerDaemon

KubernetesNodeController

Node(Server)

DockerDaemon

KubernetesNodeController

KubernetesConceptual

Copyright©2016SamsungSDSCo.,Ltd.Allrightsreserved38

Status:ClusterisrunningX

Pod(X)

ContainerContainerContainer

KubernetesControlPlane

1ofX

Page 39: Reconstructing the SRE

KubernetesConceptual

OrchestratingMultipleApplications

Copyright©2016SamsungSDSCo.,Ltd.Allrightsreserved39

KubernetesControlPlane

Node(Server)

DockerDaemon

KubernetesNodeController

Node(Server)

DockerDaemon

KubernetesNodeController

Node(Server)

DockerDaemon

KubernetesNodeController

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

App1

App2

App3

App4

App5

Page 40: Reconstructing the SRE

KubernetesNamespace(Environments)

Copyright © 2016 Samsung SDS Co., Ltd. All rights reserved40

Service

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

Page 41: Reconstructing the SRE

KubernetesNamespace(Environments)

Copyright © 2016 Samsung SDS Co., Ltd. All rights reserved41

main

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

Page 42: Reconstructing the SRE

KubernetesNamespace(Environments)

Copyright © 2016 Samsung SDS Co., Ltd. All rights reserved42

main

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

main

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

Page 43: Reconstructing the SRE

KubernetesNamespace(Environments)

Copyright © 2016 Samsung SDS Co., Ltd. All rights reserved43

main

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

main

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

CLASH

Page 44: Reconstructing the SRE

KubernetesNamespace(Environments)

Copyright © 2016 Samsung SDS Co., Ltd. All rights reserved44

main

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

prod

Page 45: Reconstructing the SRE

KubernetesNamespace(Environments)

Copyright © 2016 Samsung SDS Co., Ltd. All rights reserved45

main

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

prod

main

Pod

ContainerContainerContainer

Pod

ContainerContainerContainer

dev

Page 46: Reconstructing the SRE

KubernetesNamespace• Virtualcluster• Fundamentaladminunit,mapstodifferentorganizationalpatterns:– Resourcenamescoping,i.e.

• dev,qa,orprod– Teamsorprojects

• Group1,group2– Services

• WhatQuotasareattachedto,andfuturepermission-relatedconcepts– ReflectionofGoogleoperationalphilosophy– Micro/ACLsgettoounwieldy

Copyright © 2016 Samsung SDS Co., Ltd. All rights reserved46

Page 47: Reconstructing the SRE

KubernetesNamespaces:KeytotheCRE/AREContract• Namespacesareformalproductionentities,managedbyCRE• Devpipelinesthatcreate/modifynamespacesgetCREoversight• RBACconfigurationofnamespacesisacriticalrole• CREownsnamespacequotas• AREhasbroadpermissionsinsidethenamespace• CREownsthepodschedulingrules/contraints(taints/tolerations)

• Reportingbrokenoutbynamespace

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved47

Page 48: Reconstructing the SRE

ManagedClusterOps Services• OurgoalistoenableCustomerstofocusentirelyontheircorebusiness.• Fastestpathtoproductionclusters• Continuousmonitoringandalerting• Performanceandefficiencyanalysis

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved48

• ClusterReadinessEvaluation• StrategicBusinessAnalysis• POCDelivery• DevelopmentPipelinePlanning• DedicatedOn-boardingTeam• ManagedClusterOperations• Otherservicestoacceleratebenefitsto

yourbusiness.

ProfessionalAccelerationServices:Ourgoalistoacceleratebenefitstoyourbusinessfromcontainers,cloud-native,andfasterdeployments:

HardwareOperations

ClusterOperations

AppOperations

Quickestandandmostcosteffectivepath

SDS– CNCTCommercialOffering

Page 49: Reconstructing the SRE

ContactandProjectInfo• BobWise

– @countspongebob ontwitterandgithub– [email protected]

• CNCThomepage:– http://samsung-cnct.github.io/

• Krakenrepo– ProductiongradeclustermanagementforKubernetes– opensource,Apachelicense– https://github.com/samsung-cnct/kraken– #krakenchannelonkubernetes slack

Copyright © 2017 Samsung SDS Co., Ltd. All rights reserved49

Page 50: Reconstructing the SRE

Q&A

50