Using Run-Time Checking to Provide Safety and Progress for Distributed Cyber-Physical Systems
description
Transcript of Using Run-Time Checking to Provide Safety and Progress for Distributed Cyber-Physical Systems
1
Using Run-Time Checking to Provide Safety andProgress for Distributed Cyber-Physical SystemsStanley Bak, Fardin Abdi Taghi Abad, Zhenqi Huang, Marco Caccamo
Presentor: Renato Mancusu
• Interconnected systems that physically affect each other
• State of each node is a function of control inputs of other nodes based on system connection graph
Distributed Coordination
Images : http://geospatial.blogs.com/geospatial/2009/07/alternative-energy-green-nonemitting-clean-renewable-or-low-carbon-.htmlhttp://www.thewatertreatments.com/water/distribution-system/
2
• Distributed systems rely on communication– Reaching the desired
state– Functionality and
stability
Communication; An Essential Component
3
Communication Faults
Violation of Safety
• Unreliable Communication – unbounded message delays
and drops– Impossible to achieve
consensus in lossy network
• One approach:– Use middleware that
provides guarantees of communication and latency
– If the guarantees can not be met, an error is raised to the high-level logic
• Problem: Scalability
Limits of Distributed Coordination
4Image: “A Swarm of Nano Quadrotors”, UPENN, http://www.youtube.com/watch?v=YQIMGV5vtd4
• Goal: Examine fundamental requirements for safety in distributed systems with unreliable communication– Safety: global invariant (for example, collisions are
avoided)
• Goal: Provide a mechanism for safe progress, if the communication works adequately well– Progress: all distributed agents follow the same goal
Paper Goals
5Image: “A Swarm of Nano Quadrotors”, UPENN, http://www.youtube.com/watch?v=YQIMGV5vtd4
• A coordinating distributed system is safe under unreliable communication (arbitrary delays, unbounded packetloss), if and only if both:– Condition 1: The system is safe if no communication takes place– Condition 2: For each message m that is received by any node,
the system remains safe if no other messages are ever received after m
• Proof intuition:
Formal details in the paper
Safety Theorem
6
• Condition 2 is difficult to check ahead of time, since it’s quantified for every message– “Condition 2: For each message m that is received, the
system remains safe if no other messages are ever received after m”
• To build a usable system with this result, we check this condition at runtime, and drop messages which violate it– Of course, dropping messages impacts progress; more on
progress will be discussed in the second goal of the paper
Runtime Checking
7
8
Proposed ArchitecturePerform a safety test on each command (check condition 2)
Safe commands
pass
Unsafe commands are filtered
• Progress depends on the notion of compatible actions. These are actions which all agents can take that are globally safe.
• When put together, compatible action chains allow for global progress towards a goal. The rate of progress depends on the quality of the communication channel.
Safe Progress
9
10
Example System
• A flock of vehicles moves along a path with fixed offsets
• The user can input “detour points”, which redirect the motion of the flock
• Collisions should be avoided always• Detour points should be reached, communication
permitting
11
Non-Compatible Actions
A new waypoint for the flock is entered
Collision may occur due to a communication fault
Compatible Actions – Iteratively Approach Goal
Compatible Actions – Iteratively Approach Goal
Compatible Actions – Iteratively Approach Goal
Compatible Actions – Iteratively Approach Goal
Compatible Actions – Iteratively Approach Goal
Compatible Actions – Iteratively Approach Goal
Compatible Actions – Iteratively Approach Goal
Compatible Actions are Robust to Communication
Failures
New Detour point entered by operator
Desired final path generated for the flock
Paths generated for all the followers
Paths sent to followers!Tractor 1 did not receive the path
Tractor 1 did not receive the new path but safety is maintained!
21
Vehicle Flocking Application
• We created the vehicle flocking system within StarL, a Java-based environment for testing vehicle flocking algorithms
• StarL code can be run on a Roomba flock in UIUC, or the built-in simulator
• Effects from the communication (time, packetloss) can be simulated and have been evaluated in the paper
• Video: https://www.youtube.com/watch?v=dIGU8OTfCh8
22
Vehicle Flocking Measurement
• We measured the effect of packetloss and vehicle count on convergence time and number of messages sent
23
Future Extensions• Replace runtime reachability checks
with ahead-of-time computation• Propose a progress framework
where commands do not originate from a centralized coordinator
• Implementation on a large swarm of robots
• Provide fundamental requirements for safety in distributed systems with unreliable communication
• Provide a mechanism for safe progress, if the communication works adequately well
• Evaluate the proposed techniques on a vehicle flocking scenario
Review
24