CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient...
Transcript of CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME …jnfoster/systems-industry/ken.pdflike a transient...
KEN BIRMAN Rao Professor of Computer Science Cornell University
CLOUD HOSTED COMPUTING FOR DEMANDING REAL-TIME
SETTINGS
CAN THE CLOUD DO REAL-TIME?
Internet of Things, Smart Grid / Buildings / Cars, . . . Shared requirement:
➢ We want a system that can carry out some form of continuous monitoring, or continuous control.
➢ It will need to be robust despite “cloudy weather” and offer quick response, often with some form of consistency or fault-tolerance requirement added to the mix
CLOUD COMPUTING FOR THE SMART GRID
Real-time collection of data from widely deployed Synchronized Phasor Measurement Unit (PMU) and other SCADA data sources
➢ Each PMU device captures 44 byte records at 30Hz ➢ One per “bus” but there can be many PMUs, so aggregated data rates is high
Robust real-time tracking enables shared, consistent situational awareness and coordination
3
KILLER APPLICATIONS?
Over the horizon “grid radar” helpsoperators understand wide-area grid stress, disturbances
Tools (“apps for the smart grid”) help operators cooperate to solve problems, search knowledge base for past situations with similar fingerprint, explore what-if scenarios
4
CLOUD COMPUTING FOR THE SMART GRID Why use the cloud? It comes down to money…
By reusing today’s scalable cloud infrastructure, we: ➢ Benefit from a low-cost solution ➢ Leverage a proven, universally accessible technology ➢ The cloud is hosted at geographically diverse places
But cloud platforms aren’t known for high assurance
5
THE CAP DILEMMA: PICK 2 FROM {CONSISTENCY, AVAILABILITY AND PARTITION TOLERANCE}
Today’s cloud offers scalable snappy response, but is optimized for applications with weak security needs. It lacks
➢ Hardened network protocols aimed at consistent but tightly controlled sharing for collaboration
➢ A new distributed security model supporting total control by regional operator, controlled data flows
Our approach is to run a stronger infrastructure within Amazon’s EC2, augmenting the standard solution
6
FROM THE SENSOR TO THE SHARD
7
1
1
1
The shard members keep logs of values received indexed by time.
Due to network delay, not all have the same data at the same time.
We use IronStack as our transport layer, then run TCP or TCP/SSL on it
Private network portion
Internet portion
GRIDCLOUD: MILE HIGH OVERVIEW
8
MAIN COMPONENTS: RT-HDFS
GridCloud File System: A real-time in-memory file system for secure, strongly consistent real-time mirrored data sharing, extends HDFS ➢ Accepts streams of updates, offers a convenient snapshot feature ➢ Optimized for management of very large memory-mapped files ➢ Leverages RDMA functionality for network line-speed data transfers ➢ Easily integrated with Hadoop
9Leader developer: Weijia Song. Under the hood, makes use of Isis2 (Birman)
MAIN COMPONENTS: GC-COLLAB
GridCloud Collaboration Tool: A tool for creating a kind of sharable virtual iPad ➢ It graphs the current power network and can show you the status of any line
at a click ➢ Various “apps” can be dragged onto the network and this triggers actions,
like a transient stability analysis or listing “similar network states seen in the past” (we’re the framework. Other people build these apps)
➢ Shared with real-time consistency as needed
10MEng students reporting to Theo Gkountouvas. Leverages Live Distributed Objects + Isis2 (Ostrowski, Birman)
MAIN COMPONENTS: IRONSTACK
IronStack: A software defined network manager ➢ Focus is on “owned” networks operating under difficult conditions ➢ Employs SDN routers and uses a variety of techniques to circumvent
disruption in the event of storms, component failures. ➢ Interfaces aimed at owners who may have limited IT skill sets
11
Elegant…
Dead Simple
Rock SolidLead developer: Z Teo
MAIN COMPONENTS: DMAKE
DMake: Manages your GridCloud applications ➢ Based on the popular Unix “makefile” concept ➢ But generalized to support distributed programs where their operating
parameters can be modified at runtime ➢ It handles system repair after failures, load balancing, mapping of your
computation to the cloud computing nodes, etc ➢ Incredibly easy to use.
12Lead developer: Theo Gkountouvas, uses Isis2
UNDER THE COVERS: POWERED BY ISIS2
Used internally by these other tools •Provides secure, fault-tolerant data replication, coordination and self-repair. Lead: Birman •Employs cutting edge “virtual synchrony” programming model (basis of CORBA FT standard) •Open source, more than 4250 downloads to date from isis2.codeplex.com
13
Egyptian myth: After her brother Osiris was torn apart by Seth, Isis restored him to life
CONSISTENCY MODEL: INTEGRATES VIRTUAL SYNCHRONY WITH PAXOS Virtual synchrony is a “consistency” model: •Membership epochs: begin when a new configuration is installed and reported by delivery of a new “view” and associated state •Protocols run “during” a single epoch. A new view is reported if a failure occurs
14
pq
r
s
t
Time: 0 10 20 30 40 50 60 70
pq
r
s
t
Time: 0 10 20 30 40 50 60 70
Synchronous execution Virtually synchronous execution
Non-replicated reference executionA=3 B=7 B = B-A A=A+1
WHY NOT JUST UDP MULTICAST?
15
Isis2 user object
Isis2 user object
Isis2 user object
Isis2 library
Group instances and multicast protocolsFlow Control
Membership Oracle
Large Group Layer TCP tunnels (overlay)Dr. Multicast Platform Security
Reliable Sending Fragmentation Group Security
Sense Runtime EnvironmentSelf-stabilizing
Bootstrap ProtocolSocket Mgt/Send/Rcv
SendCausalSend
OrderedSend SafeSend Query....
Message Library “Wrapped” locks Bounded Buffers
Oracle Membership
Group membership
Report suspected failures
Views
Other group members
◻ These systems are complex, especially if you want to run on platforms like EC2
WHY NOT JUST UDP MULTICAST?
16
Isis2 user object
Isis2 user object
Isis2 user object
Isis2 library
Group instances and multicast protocolsFlow Control
Membership Oracle
Large Group Layer TCP tunnels (overlay)Dr. Multicast Platform Security
Reliable Sending Fragmentation Group Security
Sense Runtime EnvironmentSelf-stabilizing
Bootstrap ProtocolSocket Mgt/Send/Rcv
SendCausalSend
OrderedSend SafeSend Query....
Message Library “Wrapped” locks Bounded Buffers
Oracle Membership
Group membership
Report suspected failures
Views
Other group members
SafeSend and Send are two of the protocol components hosted over what we call the large-scale properties sandbox. The sandbox addresses issues like flow control, security, etc. All
protocols share and benefit from those properties
◻ These systems are complex, especially if you want to run on platforms like EC2
WHY NOT JUST UDP MULTICAST?
17
Isis2 user object
Isis2 user object
Isis2 user object
Isis2 library
Group instances and multicast protocolsFlow Control
Membership Oracle
Large Group Layer TCP tunnels (overlay)Dr. Multicast Platform Security
Reliable Sending Fragmentation Group Security
Sense Runtime EnvironmentSelf-stabilizing
Bootstrap ProtocolSocket Mgt/Send/Rcv
SendCausalSend
OrderedSend SafeSend Query....
Message Library “Wrapped” locks Bounded Buffers
Oracle Membership
Group membership
Report suspected failures
Views
Other group members
The SandBox itself is mostly composed of “convergent” protocols that use probabilistic methods
SafeSend and Send are two of the protocol components hosted over what we call the large-scale properties sandbox. The sandbox addresses issues like flow control, security, etc. All
protocols share and benefit from those properties
◻ These systems are complex, especially if you want to run on platforms like EC2
SUMMARY
With help from many organizations, Cornell is creating the world’s most robust, secure and consistent system for •Monitoring sensors, like PMUs, even at large scale and with high data rates •Hosting smart applications •Enabling collaborative problem solving, for example by offering grid operators sharable “virtual iPad” that gives easy access to powerful applications
Our prototype enhances Amazon’s EC2 cloud to host this solution cost-effectively with no compromise in its key properties
18