Cloud Native Applications and AppDynamics

45
Cloud Native Applications and AppDynamics March 2013 Adrian Cockcroft @adrianco #netflixcloud @NetflixOSS http://www.linkedin.com/in/adriancockcroft

description

Cloud Native Applications and AppDynamics. March 2013 Adrian Cockcroft @ adrianco # netflixcloud @ NetflixOSS http:// www.linkedin.com/in/adriancockcroft. Forklift Old C ode to the Cloud. Faster than datacenter deployments but fragile. Typical Hybrid Cloud Architecture. - PowerPoint PPT Presentation

Transcript of Cloud Native Applications and AppDynamics

Page 1: Cloud Native Applications and  AppDynamics

Cloud Native Applications and AppDynamics

March 2013Adrian Cockcroft

@adrianco #netflixcloud @NetflixOSShttp://www.linkedin.com/in/adriancockcroft

Page 2: Cloud Native Applications and  AppDynamics

Cloud Native

Agile Infrastructure

AppDynamics Scale

Appdynamics Configuration

AppDynamics REST API

NetflixOSS – Cloud Native On-Ramp

Page 3: Cloud Native Applications and  AppDynamics

Forklift Old Code to the Cloud

Faster than datacenter deployments but fragile

Page 4: Cloud Native Applications and  AppDynamics

Typical Hybrid Cloud Architecture

Datacenter Dinosaurs

Cloudy Buffer

Agile Mobile MammalsiOS/Android

App Servers

MySQL Legacy Apps

Page 5: Cloud Native Applications and  AppDynamics

New Anti-Fragile Patterns

Micro-servicesChaos engines

HA systems composed from ephemeral components

Page 6: Cloud Native Applications and  AppDynamics

Netflix Member Web Site Home PagePersonalization Driven

Page 7: Cloud Native Applications and  AppDynamics

Web Server Dependencies Flow(Home page business transaction as seen by AppDynamics)

Start Here

memcached

Cassandra

Web service

S3 bucket

Three Personalization movie group choosers (for US, Canada and Latam)

Each icon is three to a few hundred instances across three AWS zones

Page 8: Cloud Native Applications and  AppDynamics

Netflix Europe

Page 9: Cloud Native Applications and  AppDynamics

Cloud Native Architecture

Distributed NoSQL Datastores

Autoscaled Micro Services

Autoscaled Micro Services

Clients Things

JVM JVM

JVM JVM

Cassandra Cassandra Cassandra

Memcached

JVM

Zone A Zone B Zone C

Page 10: Cloud Native Applications and  AppDynamics

Cloud Native

Master copies of data are cloud residentEverything is dynamically provisioned

All services are ephemeral

Page 11: Cloud Native Applications and  AppDynamics

Scaling AppDynamics

Frog-boiling as a Service

Page 12: Cloud Native Applications and  AppDynamics

Peak Nodes/Controller Over Time

2009 2010 2011 2012 20130

2000

4000

6000

8000

10000

12000

14000

TestProdEuropeSecure

Page 13: Cloud Native Applications and  AppDynamics

Controllers and ApplicationsEach AWS region maps to an Application

US Production Saas

us-east-1

us-west-1

us-west-2

Europe Production

SaaS

eu-west-1

Secure EC2

us-east-1

Test SaaS

eu-west-1

us-east-1

us-west-1

us-west-2

Page 14: Cloud Native Applications and  AppDynamics

Cloud Deployment ScalabilityNew Autoscaled AMI – zero to 500 instances from 21:38:52 - 21:46:32, 7m40s

Scaled up and down over a few days, total 2176 instance launches, m2.2xlarge (4 core 34GB)

Min. 1st Qu. Median Mean 3rd Qu. Max. 41.0 104.2 149.0 171.8 215.8 562.0

Page 15: Cloud Native Applications and  AppDynamics

Ephemeral Nodes

• Largest and most active tiers are autoscaled• Average lifetime of a node is 36 hours• Controller gets storms of new metrics

Page 16: Cloud Native Applications and  AppDynamics

AppDynamics Configuration

Page 17: Cloud Native Applications and  AppDynamics

Stateless Micro-Service Architecture

Linux Base AMI (CentOS or Ubuntu)

Optional Apache

frontend, memcached, non-java apps

MonitoringLog rotation

to S3 AppDynamics machineagent Epic/Atlas

Java (JDK 6 or 7)AppDynamics

appagentmonitoring

GC and thread dump logging

TomcatApplication war file, base servlet, platform, client interface jars, Astyanax

Healthcheck, status servlets, JMX interface,

Servo autoscale

Page 18: Cloud Native Applications and  AppDynamics

Agent ConfigurationDynamic configuration script based on environment

• Set controller, set node name to tier-instance• Set tier name to Netflix service or cluster– helloworld or helloworld-live, helloworld-canary…

Page 19: Cloud Native Applications and  AppDynamics

Memcached Backend Setup

Page 20: Cloud Native Applications and  AppDynamics

Memcached Custom Exit Point

Page 21: Cloud Native Applications and  AppDynamics

AppDynamics REST API

Page 22: Cloud Native Applications and  AppDynamics
Page 23: Cloud Native Applications and  AppDynamics

Configuration Management

Datacenter CMDB’s woefulCloud is a model driven architecture

Dependably complete

Page 24: Cloud Native Applications and  AppDynamics

Monkeys

Edda – Configuration Historyhttp://techblog.netflix.com/2012/11/edda-learn-stories-of-your-cloud.html

Edda

AWS Instances, ASGs, etc.

Eureka Services

metadataAppDynamic

s Request flow

Page 25: Cloud Native Applications and  AppDynamics

Edda Query ExamplesFind any instances that have ever had a specific public IP address$ curl "http://edda/api/v2/view/instances;publicIpAddress=1.2.3.4;_since=0"["i-0123456789","i-012345678a","i-012345678b”]

Show the most recent change to a security group$ curl "http://edda/api/v2/aws/securityGroups/sg-0123456789;_diff;_all;_limit=2"--- /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351040779810+++ /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351044093504@@ -1,33 +1,33 @@ {… "ipRanges" : [ "10.10.1.1/32", "10.10.1.2/32",+ "10.10.1.3/32",- "10.10.1.4/32"… }

Page 26: Cloud Native Applications and  AppDynamics

Core Alerting Gateway

REST API to get Policy ViolationsRate limit, quench, merge with other alertstreams

Send to pagerduty or email

Page 27: Cloud Native Applications and  AppDynamics

Maintenance Use Case

• Backend Resolver Script– Cleans up dangling HTML references– Non HTML memcached and cassandra backends

• Code Structure– Get from AppDynamics– Get from Edda– Resolve– Put to AppDynamics

Page 28: Cloud Native Applications and  AppDynamics

Error Trending Tool

• Once a week– Get top N errors– Look for trends– Complain to responsible teams

• On Demand User Interface– List top errors– Pick a tier– Drill into error snapshots for more details

Page 29: Cloud Native Applications and  AppDynamics

Dependent Service Time Plot

Page 30: Cloud Native Applications and  AppDynamics

Critical Systems Inspector

• Correlates metrics across the system• Cross Tier Dependency Flow via Edda• Data via Netflix internal metric collector Atlas

Page 31: Cloud Native Applications and  AppDynamics

A Cloud Native Open Source Platform

Page 32: Cloud Native Applications and  AppDynamics

Inspiration

Page 33: Cloud Native Applications and  AppDynamics

Three Questions

Why is Netflix doing this?

How does it all fit together?

What is coming next?

Page 34: Cloud Native Applications and  AppDynamics

Platform Evolution

Bleeding Edge Innovation

Common Pattern

Shared Pattern

2009-2010 2011-2012 2013-2014

Netflix ended up several years ahead of the industry, but it’s not a sustainable position

Page 35: Cloud Native Applications and  AppDynamics

Making it easy to follow

Exploring the wild west each time vs. laying down a shared route

Page 36: Cloud Native Applications and  AppDynamics

Establish our solutions as Best

Practices / Standards

Hire, Retain and Engage Top Engineers

Build up Netflix Technology Brand

Benefit from a shared ecosystem

Goals

Page 37: Cloud Native Applications and  AppDynamics

How does it all fit together?

Page 38: Cloud Native Applications and  AppDynamics

GithubNetflixOSS

Source

AWSBase AMI

MavenCentral

JenkinsAminator

Bakery

DynaslaveAWS Build

Slaves

Asgard(+ Frigga)Console

AWSBaked AMIs

OdinOrchestration

API

AWS Account

NetflixOSS Continuous Build and Deployment

Page 39: Cloud Native Applications and  AppDynamics

AWS AccountAsgard Console

Archaius Config Service

Cross region Priam C*

Explorers Dashboards

AtlasMonitoring

Genie Hadoop Services

Multiple AWS RegionsEureka Registry

Exhibitor ZK

Edda History

Simian Army

3 AWS ZonesApplication

ClustersAutoscale Groups

Instances

PriamCassandra

Persistent Storage

EvcacheMemcached

Ephemeral Storage

NetflixOSS Services Scope

Page 40: Cloud Native Applications and  AppDynamics

• Baked AMI – Tomcat, Apache, your code• Governator – Guice based dependency injection• Archaius – dynamic configuration properties client• Eureka - service registration client

Initialization

• Karyon - Base Server for inbound requests• RxJava – Reactive pattern• Hystrix/Turbine – dependencies and real-time status• Ribbon - REST Client for outbound calls

Service Requests

• Astyanax – Cassandra client and pattern library• Evcache – Zone aware Memcached client• Curator – Zookeeper patterns• Denominator – DNS routing abstraction

Data Access

• Blitz4j – non-blocking logging• Servo – metrics export for autoscaling• Atlas – high volume instrumentation

Logging

NetflixOSS Instance Libraries

Page 41: Cloud Native Applications and  AppDynamics

• CassJmeter – load testing for C*• Circus Monkey – test rebalancingTest Tools

• Janitor Monkey• Efficiency Monkey• Doctor Monkey• Howler Monkey

Maintenance

• Chaos Monkey - Instances• Chaos Gorilla – Availability Zones• Chaos Kong - Regions• Latency Monkey – latency and error injection

Availability

• Security Monkey• Conformity MonkeySecurity

NetflixOSS Testing and Automation

Page 42: Cloud Native Applications and  AppDynamics

Example Application – RSS Reader

Page 43: Cloud Native Applications and  AppDynamics

More Use Cases

More Features

Better portability

Higher availability

Easier to deploy

Contributions from end users

Contributions from vendors

What’s Coming Next?

Page 44: Cloud Native Applications and  AppDynamics

Functionality and scale now, portability coming

Moving from parts to a platform in 2013

Netflix is fostering an ecosystem

Rapid Evolution - Low MTBIAMSH(Mean Time Between Idea And Making Stuff Happen)

Page 45: Cloud Native Applications and  AppDynamics

Takeaway

Netflix is making it easy for everyone to adopt Cloud Native patterns.

AppDynamics provides visibility into Cloud Native applications at scale.

http://netflix.github.comhttp://techblog.netflix.comhttp://slideshare.net/Netflix

http://www.linkedin.com/in/adriancockcroft

@adrianco #netflixcloud @NetflixOSS