Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.
-
Upload
dwight-greene -
Category
Documents
-
view
219 -
download
4
Transcript of Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin.
Written and Presented by: Roni Licher
Orleans: Distributed Virtual Actors for Programmability and
Scalability
236803 -Advanced Topics in Concurrent Programming – 31/12/2014
Philip A. Bernstein, Sergey Bykov, Alan Geller, Gabriel Kliot, Jorgen Thelin
Motivation:
• You have an amazing idea!• You have to build an interactive internet service.• It is a fantastic idea so everyone would like to use it.• It should handle millions of requests per hour.
• You probably should design a Distributed System.
Problem: concurrency is hard
Unpredictability
Load Balancing
High AvailabilityData Consistency
Fault Tolerance
Performance
S c a l a b l e
ReliabilityLatency
Orleans to the rescue
• A programming model (.Net and C#)• Solve complex Distributed Systems problems
(liberating the developers from dealing with those concerns) • Enables applications to attain high performance,
reliability and scalability. • A runtime environment (targeting Microsoft Azure).
Traditional 3-tier architecture
Stateless front tier (user interface)
Stateless logic tier
Database tier
Cache tier
Could make
consistency issues
Orleans 3-tier architecture
Stateless front tier (user interface)
Stateful and Stateless logic tier
Database tier
What is the Actor* model?
An Actor is a fundamental unit of computation• Processing• Storage• Communication
*The actor model was introduced in 1973
What is the Actor model?
Create other actorsProcess one message at a timeSend messages to other actors
1.Actors instances always exist, virtually Needn’t be created, looked up or deleted Code can always call methods the actor Actors never fail
2.Activations are created on-demand If there is no existing activation, a message sent to it triggers instantiation Lifecycle is managed by the runtime Transparent recovery from server failures
3.Location transparency Each actor instance has a GUID and can be referenced. Actors can pass references to one another and to non actors code. References can be persisted to cold-storage
Orleans: Virtual actors
4. Automatic scale outTwo modes of activations:
Single activation (default) – at each time only 1 simultaneous activation of an actor is allowed.
Stateless worker - many independent activations of an actor are created automatically by Orleans on-demand (up to a limit) to increase throughput.
Orleans: Virtual actors
Actors in “Orleans”
Actor Type Actor (Instance) Actor Activation
Game Actor Type
Game Actor (Instance) #2,548,308
Game Actor (Instance) #2,031,769
Game Actor #2,548,308
Activation #1 @ 192.168.1.1
Game Actor #2,031,769
Activation #1 @ 192.168.1.5
Project “Orleans”Silo: runtime execution container
Implicit activation of actors & actors lifecycle management
Coordinated actors placement Multiplexed communication Failure recovery
Actor
Silo
A cloud runtime
New Silo
Unavailable Silo
New Activation New Activations
Built for .Net Actors are .Net objects Messaging through .Net interfaces Asynchronous through promises C# Automatic error propagation
Activations are single-threaded One activation executes only one method invocation (Turn) at a time. Can register a callback on a blocking action.
No shared state Avoid races No need for locks
Actors execution model
Persistence Execution of actor requests may modify the actor state. Built-in persistence facility – checkpoints. Up to the programmer to schedule the checkpoint (at the end of each
request is completed, based on a timer or on number of requests). The underlying storage is implemented via persistence providers
such as SQL or blob store (Up to the programmer). When reactivating an actor, its state is being cached by the server.
Actors execution model
Guiding principle: Enable a simple programming model without sacrificing performance.
Overview Orleans runs on a cluster of servers in a datacenter Each server running a container process that creates and hosts actor
activations.
Three key subsystems: Messaging Hosting Execution
Runtime design choices
Messaging subsystem: Each pair of servers is connected by a TCP connection. Messages between actors are sent on this connection via special threads.
Runtime design choices
Programmer: Invoke method m on actor
a Runtime: Find the host of a.
Send this host a message on the opened channel.
message is : invoke method m on a with data d
Hosting subsystem Decides where to place activations. Manages where to find activation (more on this later). Manages activations lifecycle.
Idle activations can be deactivated.
Runtime design choices
Execution subsystem Runs actors’ application code. Cooperative Multitasking, use a set of compute threads, usually equal
to the number of CPU cores . More threads = more context switching, extra memory, longer OS
scheduling queues and reduced cache locality.
Runtime design choices
What could be a problem? Starvation. Orleans: “not a major concern
since all of the actors are owned by the same developers”
Orleans does provide monitoring and notification for long-running turns to help troubleshooting.
Distributed Directory How can it find each actor location?
Implemented as a one-hop DHT. Each record maps an actor ID to the location(s) of its activations. Each server hold a partition of the directory. On activation\deactivation a request is sent to the appropriate directory partition. Could be replicated on several servers. Maintaining a large local cache on every server with recently resolved actor-to- activation mappings.
Runtime design choices
Reliability Actor’s persistent state is the programmer responsibility. (The state should be saved to some network storage)
Each server hold a membership view of all other servers. Servers automatically detect failures via periodic heartbeats (practical 30-60 sec). If a server fails, actors will be instanced on other servers on next message. If the directory partition is lost, surviving servers will purge their directory based on cache and state.
Runtime design choices
Hoop Somuah & Sergey Bykov
Using Project “Orleans” to build Halo Cloud services in Windows Azure
3-641
<Video>
Waking up the Chief
Presence & Matchmakin
g
Challenges &
Unlockables
Skill & Ranking
Cheat Detection
Game History
ProfileUser
Generated Content
Content Managemen
t Service
Huge traffic spike on launch Downtime at launch is really
bad Also spikes on weekends
and holidays Load steadies out over time
Patterns of a big game launch
The Halo Cloud stack (2014)
Front Door ComputeWorker Roles
Azure Storage
Search
HDInsight
Hadoop
Visualization
AzureLoad-Balancer
Warehouse computer
Front Door ComputeWorker Roles
Azure Storage
Search
HDInsight
Hadoop
Visualization
AzureLoad-Balancer
Presence service
Game Session actor A
Game Session actor B
Game Session actor C
Player actor X
Player actor Y
Player actor Z
Presence(Router)actors
GameClient
MobileClient
Heartbeat
Game Session actor A
Game Session actor B
Game Session actor C
Player actor X
Player actor Y
Player actor Z
Presence(Router)actors
GameClient
MobileClient
Heartbeat
public class Presenceactor : actorBase, IPresenceactor{ public Task Heartbeat(byte[] data) { HeartbeatData heartbeatData = Heartbeat.Decrypt(data);
IGameactor game = GameactorFactory.Getactor(heartbeatData.Game);
return game.UpdateGameStatus(heartbeatData.Status); }}
Scalability by default
Near linear scaling to hundreds of thousands of requests per second
Efficient resource usage Location transparency
simplifies scaling up or down
Complements Azure PaaS Easily adjust scale over
time
Test Lab Numbers
Cooperative multitasking and threads 1000 actors The Orleans scheduler uses
a small number of compute threads, usually equal to the number of CPUs, Efficient resource usage
Degradation of the throughput, as the number of threads increases due to increasing overhead of the thread context switching, extra memory, longer OS scheduling queues and reduced cache locality.
Test Lab Numbers
Scalability in the number of actors
Number of servers was fixed at 25.
Throughput remains almost the same as the number of actors increases from 2 thousand to 2 million.
The small degradation at the large numbers is due to the increased size of internal data structures.
Test Lab Numbers
Developer productivity
Familiar programming paradigm
Fewer concurrency hazard concerns and headaches
Smooth learning curve for engineers new to asynchronous programming
Faster development through separation of concerns
Virtual MachinesPhysical Machines
“Orleans” Runtime
What’s a Server?
Ongoing project… Currently in the stage of Public Preview
Download the SDK: http://orleans.codeplex.com/ Follow the development process:
http://research.microsoft.com/en-us/projects/orleans/ Soon also will released as open source (early
2015).
Questions?