Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

39
Written and Presented by: Roni Licher Orleans: Distributed Virtual Actors for Programmability and Scalability 236803 -Advanced Topics in Concurrent Programming – 31/12/2014 Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin

Transcript of Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Page 1: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Written and Presented by: Roni Licher

Orleans: Distributed Virtual Actors for Programmability and

Scalability

236803 -Advanced Topics in Concurrent Programming – 31/12/2014

Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin

Page 2: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Motivation:

• You have an amazing idea!• You have to build an interactive internet service.• It is a fantastic idea so everyone would like to use it.• It should handle millions of requests per hour.

• You probably should design a Distributed System.

Page 3: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Problem: concurrency is hard

Unpredictability

Load Balancing

High AvailabilityData Consistency

Fault Tolerance

Performance

S c a l a b l e

ReliabilityLatency

Page 4: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Orleans to the rescue

• A programming model (.Net and C#)• Solve complex Distributed Systems problems

(liberating the developers from dealing with those concerns) • Enables applications to attain high performance,

reliability and scalability. • A runtime environment (targeting Microsoft Azure).

Page 5: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Traditional 3-tier architecture

Stateless front tier (user interface)

Stateless logic tier

Database tier

Cache tier

Could make

consistency issues

Page 6: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Orleans 3-tier architecture

Stateless front tier (user interface)

Stateful and Stateless logic tier

Database tier

Page 7: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

What is the Actor* model?

An Actor is a fundamental unit of computation• Processing• Storage• Communication

*The actor model was introduced in 1973

Page 8: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

What is the Actor model?

Create other actorsProcess one message at a timeSend messages to other actors

Page 9: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

1.Actors instances always exist, virtually Needn’t be created, looked up or deleted Code can always call methods the actor Actors never fail

2.Activations are created on-demand If there is no existing activation, a message sent to it triggers instantiation Lifecycle is managed by the runtime Transparent recovery from server failures

3.Location transparency Each actor instance has a GUID and can be referenced. Actors can pass references to one another and to non actors code. References can be persisted to cold-storage

Orleans: Virtual actors

Page 10: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

4. Automatic scale outTwo modes of activations:

Single activation (default) – at each time only 1 simultaneous activation of an actor is allowed.

Stateless worker - many independent activations of an actor are created automatically by Orleans on-demand (up to a limit) to increase throughput.

Orleans: Virtual actors

Page 11: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Actors in “Orleans”

Actor Type Actor (Instance) Actor Activation

Game Actor Type

Game Actor (Instance) #2,548,308

Game Actor (Instance) #2,031,769

Game Actor #2,548,308

Activation #1 @ 192.168.1.1

Game Actor #2,031,769

Activation #1 @ 192.168.1.5

Page 12: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Project “Orleans”Silo: runtime execution container

Implicit activation of actors & actors lifecycle management

Coordinated actors placement Multiplexed communication Failure recovery

Actor

Silo

Page 13: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

A cloud runtime

New Silo

Unavailable Silo

New Activation New Activations

Page 14: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Built for .Net Actors are .Net objects Messaging through .Net interfaces Asynchronous through promises C# Automatic error propagation

Activations are single-threaded One activation executes only one method invocation (Turn) at a time. Can register a callback on a blocking action.

No shared state Avoid races No need for locks

Actors execution model

Page 15: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Persistence Execution of actor requests may modify the actor state. Built-in persistence facility – checkpoints. Up to the programmer to schedule the checkpoint (at the end of each

request is completed, based on a timer or on number of requests). The underlying storage is implemented via persistence providers

such as SQL or blob store (Up to the programmer). When reactivating an actor, its state is being cached by the server.

Actors execution model

Page 16: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Guiding principle: Enable a simple programming model without sacrificing performance.

Overview Orleans runs on a cluster of servers in a datacenter Each server running a container process that creates and hosts actor

activations.

Three key subsystems: Messaging Hosting Execution

Runtime design choices

Page 17: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Messaging subsystem: Each pair of servers is connected by a TCP connection. Messages between actors are sent on this connection via special threads.

Runtime design choices

Programmer: Invoke method m on actor

a Runtime: Find the host of a.

Send this host a message on the opened channel.

message is : invoke method m on a with data d

Page 18: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Hosting subsystem Decides where to place activations. Manages where to find activation (more on this later). Manages activations lifecycle.

Idle activations can be deactivated.

Runtime design choices

Page 19: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Execution subsystem Runs actors’ application code. Cooperative Multitasking, use a set of compute threads, usually equal

to the number of CPU cores . More threads = more context switching, extra memory, longer OS

scheduling queues and reduced cache locality.

Runtime design choices

What could be a problem? Starvation. Orleans: “not a major concern

since all of the actors are owned by the same developers”

Orleans does provide monitoring and notification for long-running turns to help troubleshooting.

Page 20: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Distributed Directory How can it find each actor location?

Implemented as a one-hop DHT. Each record maps an actor ID to the location(s) of its activations. Each server hold a partition of the directory. On activation\deactivation a request is sent to the appropriate directory partition. Could be replicated on several servers. Maintaining a large local cache on every server with recently resolved actor-to- activation mappings.

Runtime design choices

Page 21: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Reliability Actor’s persistent state is the programmer responsibility. (The state should be saved to some network storage)

Each server hold a membership view of all other servers. Servers automatically detect failures via periodic heartbeats (practical 30-60 sec). If a server fails, actors will be instanced on other servers on next message. If the directory partition is lost, surviving servers will purge their directory based on cache and state.

Runtime design choices

Page 22: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Hoop Somuah & Sergey Bykov

Using Project “Orleans” to build Halo Cloud services in Windows Azure

3-641

Page 23: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

<Video>

Page 24: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.
Page 25: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.
Page 26: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Waking up the Chief

Presence & Matchmakin

g

Challenges &

Unlockables

Skill & Ranking

Cheat Detection

Game History

ProfileUser

Generated Content

Content Managemen

t Service

Page 27: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.
Page 28: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Huge traffic spike on launch Downtime at launch is really

bad Also spikes on weekends

and holidays Load steadies out over time

Patterns of a big game launch

Page 29: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

The Halo Cloud stack (2014)

Front Door ComputeWorker Roles

Azure Storage

Search

HDInsight

Hadoop

Visualization

AzureLoad-Balancer

Page 30: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Warehouse computer

Front Door ComputeWorker Roles

Azure Storage

Search

HDInsight

Hadoop

Visualization

AzureLoad-Balancer

Page 31: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Presence service

Game Session actor A

Game Session actor B

Game Session actor C

Player actor X

Player actor Y

Player actor Z

Presence(Router)actors

GameClient

MobileClient

Page 32: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Heartbeat

Game Session actor A

Game Session actor B

Game Session actor C

Player actor X

Player actor Y

Player actor Z

Presence(Router)actors

GameClient

MobileClient

Page 33: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Heartbeat

public class Presenceactor : actorBase, IPresenceactor{ public Task Heartbeat(byte[] data) { HeartbeatData heartbeatData = Heartbeat.Decrypt(data);

IGameactor game = GameactorFactory.Getactor(heartbeatData.Game);

return game.UpdateGameStatus(heartbeatData.Status); }}

Page 34: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Scalability by default

Near linear scaling to hundreds of thousands of requests per second

Efficient resource usage Location transparency

simplifies scaling up or down

Complements Azure PaaS Easily adjust scale over

time

Test Lab Numbers

Page 35: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Cooperative multitasking and threads 1000 actors The Orleans scheduler uses

a small number of compute threads, usually equal to the number of CPUs, Efficient resource usage

Degradation of the throughput, as the number of threads increases due to increasing overhead of the thread context switching, extra memory, longer OS scheduling queues and reduced cache locality.

Test Lab Numbers

Page 36: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Scalability in the number of actors

Number of servers was fixed at 25.

Throughput remains almost the same as the number of actors increases from 2 thousand to 2 million.

The small degradation at the large numbers is due to the increased size of internal data structures.

Test Lab Numbers

Page 37: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Developer productivity

Familiar programming paradigm

Fewer concurrency hazard concerns and headaches

Smooth learning curve for engineers new to asynchronous programming

Faster development through separation of concerns

Virtual MachinesPhysical Machines

“Orleans” Runtime

What’s a Server?

Page 38: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Ongoing project… Currently in the stage of Public Preview

Download the SDK: http://orleans.codeplex.com/ Follow the development process:

http://research.microsoft.com/en-us/projects/orleans/ Soon also will released as open source (early

2015).

Page 39: Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.

Questions?