Microservices summit talk 1/31

61
Google confidential │ Do not distribute Google confidential │ Do not distribute Bringing learnings from Googley microservices with gRPC Microservices Summit Varun Talwar

Transcript of Microservices summit talk 1/31

Page 1: Microservices summit talk   1/31

Google confidential │ Do not distribute Google confidential │ Do not distribute

Bringing learnings from Googley microservices with gRPCMicroservices Summit

Varun Talwar

Page 2: Microservices summit talk   1/31

Contents1. Context: Why are we here?

2. Learnings from Stubby experience

a. HTTP/JSON doesnt cut it

b. Establish a Lingua Franca

c. Design for fault tolerance and control: Sync/Async, Deadlines, Cancellations, Flow control

d. Flying blind without stats

e. Diagnosing with tracing

f. Load Balancing is critical

3. gRPC

a. Cross platform matters !

b. Performance and Standards matter: HTTP/2

c. Pluggability matters: Interceptors, Name Resolvers, Auth plugins

d. Usability matters !

Page 3: Microservices summit talk   1/31

CONTEXTWHY ARE WE HERE?

Page 4: Microservices summit talk   1/31

Agility & Resilience

Page 5: Microservices summit talk   1/31

Developer Productivity

Page 6: Microservices summit talk   1/31

Performance

Page 7: Microservices summit talk   1/31

INTRODUCING STUBBY

Page 8: Microservices summit talk   1/31

Google confidential │ Do not distribute

Microservices at Google ~O(1010) RPCs per second.

Images by Connie Zhou

Page 9: Microservices summit talk   1/31

Stubby Magic @ Google

Page 10: Microservices summit talk   1/31

Making Google magic available to all

Kubernetes

Borg

Stubby

Page 11: Microservices summit talk   1/31

LEARNINGS FROM

STUBBY

Page 12: Microservices summit talk   1/31

Key learnings

1. HTTP/JSON doesnt cut it !

2. Establish a lingua franca

3. Design for fault tolerance and provide control knobs

4. Dont fly blind: Service Analytics

5. Diagnosing problems: Tracing

6. Load Balancing is critical

7. Solve for auth

Page 13: Microservices summit talk   1/31

HTTP1.x/JSON doesn’t cut it !

1. WWW, browser growth - bled into services

2. Stateless

3. Text on the wire

4. Loose contracts

5. TCP connection per request

6. Nouns based

7. Harder API evolution

8. Think compute, network on cloud platforms

1

Page 14: Microservices summit talk   1/31

Establish a lingua franca

1. Protocol Buffers - Since 2003.

2. Start with IDL

3. Have a language agnostic way of agreeing on data semantics

4. Code Gen in various languages

5. Forward and Backward compatibility

6. API Evolution

2

Page 15: Microservices summit talk   1/31

How we roll at Google

Page 16: Microservices summit talk   1/31

Google Cloud Platform

Service Definition (weather.proto)syntax = "proto3";

service Weather { rpc GetCurrent(WeatherRequest) returns (WeatherResponse);}

message WeatherRequest { Coordinates coordinates = 1;

message Coordinates { fixed64 latitude = 1; fixed64 longitude = 2; }}

message WeatherResponse { Temperature temperature = 1; float humidity = 2;}

message Temperature { float degrees = 1; Units units = 2;

enum Units { FAHRENHEIT = 0; CELSIUS = 1; KELVIN = 2; }}

Page 17: Microservices summit talk   1/31

Design for fault tolerance and control

Sync and Async APIs

Need fault tolerance: Deadlines, Cancellations

Control Knobs: Flow control, Service Config, Metadata

3

Page 18: Microservices summit talk   1/31

Google Cloud Platform 18

First-class feature in gRPC.

Deadline is an absolute point in time.

Deadline indicates to the server how long the client is willing to wait for an answer.

RPC will fail with DEADLINE_EXCEEDED status code when deadline reached.

gRPC Deadlines

Page 19: Microservices summit talk   1/31

Google Cloud Platform

Deadline Propagation

Gateway

90 ms

Now = 147660000000

0

Deadline = 147660000020

0

40 ms

20 ms

20 ms

60 ms

withDeadlineAfter(200, MILLISECONDS)

Now = 147660000004

0

Deadline = 147660000020

0

Now = 147660000015

0

Deadline = 147660000020

0

Now = 147660000023

0

Deadline = 147660000020

0

DEADLINE_EXCEEDED

DEADLINE_EXCEEDED

DEADLINE_EXCEEDED

DEADLINE_EXCEEDED

Page 20: Microservices summit talk   1/31

Google Cloud Platform 20

Deadlines are expected.

What about unpredictable cancellations?

•User cancelled request.

•Caller is not interested in the result any more.

•etc

Cancellation?

Page 21: Microservices summit talk   1/31

Google Cloud Platform

Cancellation?

GW

Busy Busy Busy

Busy Busy Busy

Busy Busy Busy

Active RPC

Active RPC

Active RPC

Active RPC

Active RPC

Active RPC

Active RPC

Active RPC

Active RPC

Page 22: Microservices summit talk   1/31

Google Cloud Platform

Cancellation Propagation

GW

Idle Idle Idle

Idle Idle Idle

Idle Idle Idle

Page 23: Microservices summit talk   1/31

Google Cloud Platform 23

Automatically propagated.

RPC fails with CANCELLED status code.

Cancellation status be accessed by the receiver.

Server (receiver) always knows if RPC is valid!

Cancellation

Page 24: Microservices summit talk   1/31

Google Cloud Platform

BiDi Streaming - Slow Client

Fast ServerRequest

Responses

Slow Client

CANCELLEDUNAVAILABLE

RESOURCE_EXHAUSTED

Page 25: Microservices summit talk   1/31

Google Cloud Platform

BiDi Streaming - Slow Server

Slow ServerRequest

Response

Fast Client

CANCELLEDUNAVAILABLE

RESOURCE_EXHAUSTED

Requests

Page 26: Microservices summit talk   1/31

Google Cloud Platform 26

Flow-control helps to balance computing power and network capacity between client and server.

gRPC supports both client- and server-side flow control.

Flow-Control

Photo taken by Andrey Borisenko.

Page 27: Microservices summit talk   1/31

Google Cloud Platform 27

Policies where server tells client what they should do

Can specify deadlines, lb policy, payload size per method of a service

Loved by SREs, they have more control

Discovery via DNS

Service Config

Page 28: Microservices summit talk   1/31

Metadata Exchange - Common cross-cutting concerns like authentication or tracing rely on the exchange of data that is not part of the declared interface of a service. Deployments rely on their ability to evolve these features at a different rate to the individual APIs exposed by services.

Metadata helps in exchange of useful information

Page 29: Microservices summit talk   1/31

Don’t fly blind: Stats4

What is the mean latency time per RPC?

How many RPCs per hour for a service?

Errors in last minute/hour?

How many bytes sent? How many connections to my server?

Page 30: Microservices summit talk   1/31

Data collection by arbitrary metadata is useful

Any service’s resource usage and performance stats in real time by (almost) any arbitrary metadata

1. Service X can monitor CPU usage in their jobs broken down by the name of the invoked RPC and the mdb user who sent it.

2. Ads can monitor the RPC latency of shared bigtable jobs when responding to their requests, broken down by whether the request originated from a user on web/Android/iOS.

3. Gmail can collect usage on servers, broken down by according POP/IMAP/web/Android/iOS. Layer propagates Gmail's metadata down to every service, even if the request was made by an intermediary job that Gmail doesn't own

Stats layer export data to varz and streamz, and provides stats to many monitoring systems and dashboards

Page 31: Microservices summit talk   1/31

Diagnosing problems: Tracing5

1/10K requests takes very long. Its an ad query :-) I need to find out.

Take a sample and store in database; help identify request in sample which took similar amount of time

I didnt get a response from the service. What happened? Which link in the service dependency graph got stuck? Stitch a trace and figure out.

Where is it taking time for a trace? Hotspot analysis

What all are the dependencies for a service?

Page 32: Microservices summit talk   1/31

Load Balancing is important !5

Iteration 1: Stubby BalancerIteration 2: Client side load balancingIteration 3: HybridIteration 4: gRPC-lb

Page 33: Microservices summit talk   1/31

● Round-robin-over-list - Lists not sets → ability to represent weights

● For anything more advanced, move the burden to an external "LB Controller", a regular gRPC server and rely on a client-side implementation of the so-called gRPC LB policy.

client LB Controller

backends

1) Control RPC2) address-list

3) RR over addresses of address-list

gRPC LB

Some new ideas !Iteration 1: Stubby BalancerIteration 2: Client side load balancingIteration 3: HybridIteration 4: gRPC-lb

Page 34: Microservices summit talk   1/31

In summary, what did we learn

Contracts should be strict

Common language helps

Common understanding for deadlines, cancellations, flow control

Common stats/tracing framework is essential for monitoring, debugging

Common framework lets uniform policy application for control and lb

Single point of integration for logging, monitoring, tracing, service discovery and load balancing makes lives much easier !

Page 35: Microservices summit talk   1/31

INTRODUCING gRPC

Page 36: Microservices summit talk   1/31

Open source on Github for C, C++, Java, Node.js, Python, Ruby, Go, C#, PHP, Objective-C

gRPC core

gRPC Java

gRPC Go

Page 37: Microservices summit talk   1/31

1.0 with stable APIs

Well documented with an active community

Reliable with continuous running tests on GCE

Deployable in your environment

Measured with an open performance dashboard

Deployable in your environment

Well adopted inside and outside Google

Where is the project today?

Page 38: Microservices summit talk   1/31

1. Cross language & Cross platform matters !

2. Performance and Standards matter: HTTP/2

3. Pluggability matters: Interceptors, Name Resolvers, Auth plugins

4. Usability matters !

More lessons

Page 39: Microservices summit talk   1/31

1. Cross language & Cross platform matters !

2. Performance and Standards matter: HTTP/2

3. Pluggability matters: Interceptors, Name Resolvers, Auth plugins

4. Usability matters !

More lessons

Page 40: Microservices summit talk   1/31

Google Cloud PlatformGoogle Cloud Platform

Coverage & Simplicity

The stack should be available on every popular development platform and easy for someone to build for their platform of choice. It should be viable on CPU & memory limited devices.

gRPC Principles & Requirements

http://www.grpc.io/blog/principles

Page 41: Microservices summit talk   1/31

Google Cloud Platform

gRPC Speaks Your Language

● Java● Go● C/C++● C#● Node.js● PHP● Ruby● Python● Objective-C

● MacOS● Linux● Windows● Android● iOS

Service definitions and client libraries

Platforms supported

Page 42: Microservices summit talk   1/31

Google Cloud Platform

Interoperability

Java Servic

e

Python

Service

GoLang

Service

C++ Servic

e

gRPC Service

gRPC

Stub

gRPC

Stub

gRPC

Stub

gRPC

Stub

gRPC Service

gRPC Service

gRPC Service

gRPC

Stub

Page 43: Microservices summit talk   1/31

1. Cross language & Cross platform matters !

2. Performance and Standards matter: HTTP/2

3. Pluggability matters: Interceptors, Name Resolvers, Auth plugins

4. Usability matters !

More lessons

Page 44: Microservices summit talk   1/31

Google Cloud Platform

• Single TCP connection.

• No Head-of-line blocking.

• Binary framing layer.

• Request –> Stream.

• Header Compression.

HTTP/2 in One Slide

Transport(TCP)

Application (HTTP/2)

Network (IP)

Session (TLS) [optional]Binary Framing

HEADERS FrameDATA Frame

HTTP/2

POST: /upload HTTP/1.1 Host: www.javaday.org.ua Content-Type: application/json Content-Length: 27

HTTP/1.x

{“msg”: “Welcome to 2016!”}

Page 45: Microservices summit talk   1/31

Google Cloud Platform

HTTP/2 breaks down the HTTP protocol communication into an exchange of binary-encoded frames, which are then mapped to messages that belong to a stream, and all of which are multiplexed within a single TCP connection.

Binary Framing Stream 1 HEADERS

Stream 2

:method: GET :path: /kyiv :version: HTTP/2 :scheme: https

HEADERS :status: 200 :version: HTTP/2 :server: nginx/1.10.1 ...

DATA

<payload>

Stream N

Request

Response

TCP

Page 46: Microservices summit talk   1/31

Google Cloud Platform

HTTP/1.x vs HTTP/2http://http2.golang.org/gophertileshttp://www.http2demo.io/

Page 47: Microservices summit talk   1/31

Google Cloud Platform

gRPC Service Definitions

Unary RPCs where the client sends a single request to the server and gets a single response back, just like a normal function call.

The client sends a request to the server and gets a stream to read a sequence of messages back. The client reads from the returned stream until there are no more messages.

The client send a sequence of messages to the server using a provided stream. Once the client has finished writing the messages, it waits for the server to read them and return its response.

Client streaming

Both sides send a sequence of messages using a read-write stream. The two streams operate independently. The order of messages in each stream is preserved.

BiDi streamingUnary Server streaming

Page 48: Microservices summit talk   1/31

Google Cloud Platform 48

Messaging applications.

Games / multiplayer tournaments.

Moving objects.

Sport results.

Stock market quotes.

Smart home devices.

You name it!

BiDi Streaming Use-Cases

Page 49: Microservices summit talk   1/31

Open Performance Benchmark and Dashboard

Benchmarks run in GCE VMs per Pull Request for regression testing.

gRPC Users can run these in their environments.

Good Performance across languages:

Java Throughput: 500 K RPCs/Sec and 1.3 M Streaming messages/Sec on 32 core VMs

Java Latency: ~320 us for unary ping-pong (netperf 120us)

C++ Throughput: ~1.3 M RPCs/Sec and 3 M Streaming Messages/Sec on 32 core VMs.

Performance

Page 50: Microservices summit talk   1/31
Page 51: Microservices summit talk   1/31
Page 52: Microservices summit talk   1/31

1. Cross language & Cross platform matters !

2. Performance and Standards matter: HTTP/2

3. Pluggability matters: Interceptors, Auth

4. Usability matters !

More lessons

Page 53: Microservices summit talk   1/31

Google Cloud PlatformGoogle Cloud Platform

Pluggable

Large distributed systems need security, health-checking, load-balancing and failover, monitoring, tracing, logging, and so on. Implementations should provide extensions points to allow for plugging in these features and, where useful, default implementations.

gRPC Principles & Requirements

http://www.grpc.io/blog/principles

Page 54: Microservices summit talk   1/31

Google Cloud Platform

Interceptors

Client ServerRequest

Response

Client interceptor

s

Server interceptor

s

Page 55: Microservices summit talk   1/31

Auth & Security - TLS [Mutual], Plugin auth mechanism (e.g. OAuth)

Proxies

Basic: nghttp2, haproxy, traefik

Advanced: Envoy, linkerd, Google LB, Nginx (in progress)

Service Discovery

etcd, Zookeeper, Eureka, …

Monitor & Trace

Zipkin, Prometheus, Statsd, Google, DIY

Pluggability

Page 56: Microservices summit talk   1/31

1. Cross language & Cross platform matters !

2. Performance and Standards matter: HTTP/2

3. Pluggability matters: Interceptors, Auth

4. Usability matters !

More lessons

Page 57: Microservices summit talk   1/31

Get Started

Page 58: Microservices summit talk   1/31

1. Server reflection2. Health Checking3. Automatic retries4. Streaming compression5. Mechanism to do caching6. Binary Logging

a. Debugging, auditing though costly7. Unit Testing support

a. Automated mock testingb. Dont need to bring up all dependent services just to test

8. Web support

Coming soon !

Page 59: Microservices summit talk   1/31

Microservices: in data centres

Streaming telemetry from network devices

Client Server communication/Internal APIs

Mobile Apps

Some early adopters

Page 60: Microservices summit talk   1/31

Thank you!Thank you!Twitter: @grpcioSite: grpc.ioGroup: [email protected]: github.com/grpc

github.com/grpc/grpc-java github.com/grpc/grpc-go

Page 61: Microservices summit talk   1/31

Q & A