Automated Revision of Existing Programs
-
Upload
yoshino-gen -
Category
Documents
-
view
18 -
download
1
description
Transcript of Automated Revision of Existing Programs
Automated Revision of Existing Programs
Software Engineering and Network Systems Laboratory (SENS)
Borzoo Bonakdarpour
Automated Revision of Existing Programs 2
Motivation
Question Is it possible to revise the model automatically such that it satisfies the failed property while preserving the other properties
CounterexampleCounterexample
Model
Property
ModelChecker
Automated Revision of Existing Programs 3
Motivation (contrsquod)
Requirements
Question Is it possible to add a newly discovered property to an existing program automatically
SpecificationSpecification
Designer Program
Incomplete Incomplete SpecificationSpecification
New Property
Automated Revision of Existing Programs 4
Outline
What is program revision Adding properties to existing programs
Results Example
Adding fault-tolerance to existing real-time programs Results Example
Ongoing research Open problems and Future work
Automated Revision of Existing Programs 5
Program Revision
Revision by synthesis From specification Comprehensive revision
Highly expensive No reusability
From the existing program + new property Local revision Provides reusability In some cases offers lower classes of time and space complexity Does not need to have the entire specification of the existing program
RevisionAlgorithm
Program P
Property P
Automated Revision of Existing Programs 6
Our Goal
We identify classes of interesting properties typically used in specifying reactive systems
Designing synthesis methods where revising existing programs is feasible time-wise and space-wise
QuestionQuestion Why comprehensive revision is highly Why comprehensive revision is highly complexcomplexAnswerAnswer Expressiveness of specifications Expressiveness of specifications
Automated Revision of Existing Programs 7
Part I
Adding PropertiesProperties to Existing Programs
Automated Revision of Existing Programs 8
Preliminary Concepts
A program p is a triple p = Sp Ip p ie finite state space set of initial states and program transitions
A state predicate is any subset of Sp
A computation is a state sequence s0 s1 hellip iff s0 Ip
i gt 0 (si-1 si) p
If terminates in state sf then there does not exist s such that (sf s) p
Automated Revision of Existing Programs 9
Preliminary Concepts (cont)
Sample Properties Safety
P unless Q
stable(P)
invariant(P)
Liveness P leads-to Q P Q
P PP P P P
PP P
P P P Q
Automated Revision of Existing Programs 10
Preliminary Concepts (cont)
A specification is a conjunction of a set of properties
spec = L1 L2 hellip Ln
A computation satisfies spec iff (i | 0 i n satisfies Li)
A program p satisfies spec iff all computations of p1 are infinite
2 satisfy spec
Automated Revision of Existing Programs 11
Problem Statement
Formulation of the problem p satisfies existing specification spece
Sp = Sp
Ip = Ip
p p
All computations of p are infinite satisfy specn
SynthesisAlgorithm
Program p = Sp Ip p
Program p = Sp Ip pA Specification specn
Automated Revision of Existing Programs 12
Adding a Single Leads-to Property (R T )
Sp
Ip R T
Case 1 Deadlock statesRemove transitions (s1 s2) if s2 R and T is not reachable from S2
s0
s1
s2
Automated Revision of Existing Programs 13
Adding a Single Leads-to Property (R T )
Sp Ip R T
Break cycles reachable from R without reaching Q
s1
s4
s2
s3
Case 2 Cycles
s0
Automated Revision of Existing Programs 14
Soundness and Completeness
Theorem (1) The algorithm for adding multiple safety properties along with a leads-to property is sound and complete Fixability
Theorem (2) The complexity of the algorithm for adding multiple safety properties along with a leads-to property is polynomial-time
Automated Revision of Existing Programs 15
Adding Two Leads-to Properties Adding two leads-to properties one after another
Sp
Ip R T
s5
s6
s3s4
s7
P Qs9
s0
s6
s1
s2
s8
Automated Revision of Existing Programs 16
Other Results
The problems of simultaneous addition of two leads-to properties is NP-complete
The problem of adding one leads-to property while maintaining maximum non-determinism is NP-complete
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 2
Motivation
Question Is it possible to revise the model automatically such that it satisfies the failed property while preserving the other properties
CounterexampleCounterexample
Model
Property
ModelChecker
Automated Revision of Existing Programs 3
Motivation (contrsquod)
Requirements
Question Is it possible to add a newly discovered property to an existing program automatically
SpecificationSpecification
Designer Program
Incomplete Incomplete SpecificationSpecification
New Property
Automated Revision of Existing Programs 4
Outline
What is program revision Adding properties to existing programs
Results Example
Adding fault-tolerance to existing real-time programs Results Example
Ongoing research Open problems and Future work
Automated Revision of Existing Programs 5
Program Revision
Revision by synthesis From specification Comprehensive revision
Highly expensive No reusability
From the existing program + new property Local revision Provides reusability In some cases offers lower classes of time and space complexity Does not need to have the entire specification of the existing program
RevisionAlgorithm
Program P
Property P
Automated Revision of Existing Programs 6
Our Goal
We identify classes of interesting properties typically used in specifying reactive systems
Designing synthesis methods where revising existing programs is feasible time-wise and space-wise
QuestionQuestion Why comprehensive revision is highly Why comprehensive revision is highly complexcomplexAnswerAnswer Expressiveness of specifications Expressiveness of specifications
Automated Revision of Existing Programs 7
Part I
Adding PropertiesProperties to Existing Programs
Automated Revision of Existing Programs 8
Preliminary Concepts
A program p is a triple p = Sp Ip p ie finite state space set of initial states and program transitions
A state predicate is any subset of Sp
A computation is a state sequence s0 s1 hellip iff s0 Ip
i gt 0 (si-1 si) p
If terminates in state sf then there does not exist s such that (sf s) p
Automated Revision of Existing Programs 9
Preliminary Concepts (cont)
Sample Properties Safety
P unless Q
stable(P)
invariant(P)
Liveness P leads-to Q P Q
P PP P P P
PP P
P P P Q
Automated Revision of Existing Programs 10
Preliminary Concepts (cont)
A specification is a conjunction of a set of properties
spec = L1 L2 hellip Ln
A computation satisfies spec iff (i | 0 i n satisfies Li)
A program p satisfies spec iff all computations of p1 are infinite
2 satisfy spec
Automated Revision of Existing Programs 11
Problem Statement
Formulation of the problem p satisfies existing specification spece
Sp = Sp
Ip = Ip
p p
All computations of p are infinite satisfy specn
SynthesisAlgorithm
Program p = Sp Ip p
Program p = Sp Ip pA Specification specn
Automated Revision of Existing Programs 12
Adding a Single Leads-to Property (R T )
Sp
Ip R T
Case 1 Deadlock statesRemove transitions (s1 s2) if s2 R and T is not reachable from S2
s0
s1
s2
Automated Revision of Existing Programs 13
Adding a Single Leads-to Property (R T )
Sp Ip R T
Break cycles reachable from R without reaching Q
s1
s4
s2
s3
Case 2 Cycles
s0
Automated Revision of Existing Programs 14
Soundness and Completeness
Theorem (1) The algorithm for adding multiple safety properties along with a leads-to property is sound and complete Fixability
Theorem (2) The complexity of the algorithm for adding multiple safety properties along with a leads-to property is polynomial-time
Automated Revision of Existing Programs 15
Adding Two Leads-to Properties Adding two leads-to properties one after another
Sp
Ip R T
s5
s6
s3s4
s7
P Qs9
s0
s6
s1
s2
s8
Automated Revision of Existing Programs 16
Other Results
The problems of simultaneous addition of two leads-to properties is NP-complete
The problem of adding one leads-to property while maintaining maximum non-determinism is NP-complete
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 3
Motivation (contrsquod)
Requirements
Question Is it possible to add a newly discovered property to an existing program automatically
SpecificationSpecification
Designer Program
Incomplete Incomplete SpecificationSpecification
New Property
Automated Revision of Existing Programs 4
Outline
What is program revision Adding properties to existing programs
Results Example
Adding fault-tolerance to existing real-time programs Results Example
Ongoing research Open problems and Future work
Automated Revision of Existing Programs 5
Program Revision
Revision by synthesis From specification Comprehensive revision
Highly expensive No reusability
From the existing program + new property Local revision Provides reusability In some cases offers lower classes of time and space complexity Does not need to have the entire specification of the existing program
RevisionAlgorithm
Program P
Property P
Automated Revision of Existing Programs 6
Our Goal
We identify classes of interesting properties typically used in specifying reactive systems
Designing synthesis methods where revising existing programs is feasible time-wise and space-wise
QuestionQuestion Why comprehensive revision is highly Why comprehensive revision is highly complexcomplexAnswerAnswer Expressiveness of specifications Expressiveness of specifications
Automated Revision of Existing Programs 7
Part I
Adding PropertiesProperties to Existing Programs
Automated Revision of Existing Programs 8
Preliminary Concepts
A program p is a triple p = Sp Ip p ie finite state space set of initial states and program transitions
A state predicate is any subset of Sp
A computation is a state sequence s0 s1 hellip iff s0 Ip
i gt 0 (si-1 si) p
If terminates in state sf then there does not exist s such that (sf s) p
Automated Revision of Existing Programs 9
Preliminary Concepts (cont)
Sample Properties Safety
P unless Q
stable(P)
invariant(P)
Liveness P leads-to Q P Q
P PP P P P
PP P
P P P Q
Automated Revision of Existing Programs 10
Preliminary Concepts (cont)
A specification is a conjunction of a set of properties
spec = L1 L2 hellip Ln
A computation satisfies spec iff (i | 0 i n satisfies Li)
A program p satisfies spec iff all computations of p1 are infinite
2 satisfy spec
Automated Revision of Existing Programs 11
Problem Statement
Formulation of the problem p satisfies existing specification spece
Sp = Sp
Ip = Ip
p p
All computations of p are infinite satisfy specn
SynthesisAlgorithm
Program p = Sp Ip p
Program p = Sp Ip pA Specification specn
Automated Revision of Existing Programs 12
Adding a Single Leads-to Property (R T )
Sp
Ip R T
Case 1 Deadlock statesRemove transitions (s1 s2) if s2 R and T is not reachable from S2
s0
s1
s2
Automated Revision of Existing Programs 13
Adding a Single Leads-to Property (R T )
Sp Ip R T
Break cycles reachable from R without reaching Q
s1
s4
s2
s3
Case 2 Cycles
s0
Automated Revision of Existing Programs 14
Soundness and Completeness
Theorem (1) The algorithm for adding multiple safety properties along with a leads-to property is sound and complete Fixability
Theorem (2) The complexity of the algorithm for adding multiple safety properties along with a leads-to property is polynomial-time
Automated Revision of Existing Programs 15
Adding Two Leads-to Properties Adding two leads-to properties one after another
Sp
Ip R T
s5
s6
s3s4
s7
P Qs9
s0
s6
s1
s2
s8
Automated Revision of Existing Programs 16
Other Results
The problems of simultaneous addition of two leads-to properties is NP-complete
The problem of adding one leads-to property while maintaining maximum non-determinism is NP-complete
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 4
Outline
What is program revision Adding properties to existing programs
Results Example
Adding fault-tolerance to existing real-time programs Results Example
Ongoing research Open problems and Future work
Automated Revision of Existing Programs 5
Program Revision
Revision by synthesis From specification Comprehensive revision
Highly expensive No reusability
From the existing program + new property Local revision Provides reusability In some cases offers lower classes of time and space complexity Does not need to have the entire specification of the existing program
RevisionAlgorithm
Program P
Property P
Automated Revision of Existing Programs 6
Our Goal
We identify classes of interesting properties typically used in specifying reactive systems
Designing synthesis methods where revising existing programs is feasible time-wise and space-wise
QuestionQuestion Why comprehensive revision is highly Why comprehensive revision is highly complexcomplexAnswerAnswer Expressiveness of specifications Expressiveness of specifications
Automated Revision of Existing Programs 7
Part I
Adding PropertiesProperties to Existing Programs
Automated Revision of Existing Programs 8
Preliminary Concepts
A program p is a triple p = Sp Ip p ie finite state space set of initial states and program transitions
A state predicate is any subset of Sp
A computation is a state sequence s0 s1 hellip iff s0 Ip
i gt 0 (si-1 si) p
If terminates in state sf then there does not exist s such that (sf s) p
Automated Revision of Existing Programs 9
Preliminary Concepts (cont)
Sample Properties Safety
P unless Q
stable(P)
invariant(P)
Liveness P leads-to Q P Q
P PP P P P
PP P
P P P Q
Automated Revision of Existing Programs 10
Preliminary Concepts (cont)
A specification is a conjunction of a set of properties
spec = L1 L2 hellip Ln
A computation satisfies spec iff (i | 0 i n satisfies Li)
A program p satisfies spec iff all computations of p1 are infinite
2 satisfy spec
Automated Revision of Existing Programs 11
Problem Statement
Formulation of the problem p satisfies existing specification spece
Sp = Sp
Ip = Ip
p p
All computations of p are infinite satisfy specn
SynthesisAlgorithm
Program p = Sp Ip p
Program p = Sp Ip pA Specification specn
Automated Revision of Existing Programs 12
Adding a Single Leads-to Property (R T )
Sp
Ip R T
Case 1 Deadlock statesRemove transitions (s1 s2) if s2 R and T is not reachable from S2
s0
s1
s2
Automated Revision of Existing Programs 13
Adding a Single Leads-to Property (R T )
Sp Ip R T
Break cycles reachable from R without reaching Q
s1
s4
s2
s3
Case 2 Cycles
s0
Automated Revision of Existing Programs 14
Soundness and Completeness
Theorem (1) The algorithm for adding multiple safety properties along with a leads-to property is sound and complete Fixability
Theorem (2) The complexity of the algorithm for adding multiple safety properties along with a leads-to property is polynomial-time
Automated Revision of Existing Programs 15
Adding Two Leads-to Properties Adding two leads-to properties one after another
Sp
Ip R T
s5
s6
s3s4
s7
P Qs9
s0
s6
s1
s2
s8
Automated Revision of Existing Programs 16
Other Results
The problems of simultaneous addition of two leads-to properties is NP-complete
The problem of adding one leads-to property while maintaining maximum non-determinism is NP-complete
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 5
Program Revision
Revision by synthesis From specification Comprehensive revision
Highly expensive No reusability
From the existing program + new property Local revision Provides reusability In some cases offers lower classes of time and space complexity Does not need to have the entire specification of the existing program
RevisionAlgorithm
Program P
Property P
Automated Revision of Existing Programs 6
Our Goal
We identify classes of interesting properties typically used in specifying reactive systems
Designing synthesis methods where revising existing programs is feasible time-wise and space-wise
QuestionQuestion Why comprehensive revision is highly Why comprehensive revision is highly complexcomplexAnswerAnswer Expressiveness of specifications Expressiveness of specifications
Automated Revision of Existing Programs 7
Part I
Adding PropertiesProperties to Existing Programs
Automated Revision of Existing Programs 8
Preliminary Concepts
A program p is a triple p = Sp Ip p ie finite state space set of initial states and program transitions
A state predicate is any subset of Sp
A computation is a state sequence s0 s1 hellip iff s0 Ip
i gt 0 (si-1 si) p
If terminates in state sf then there does not exist s such that (sf s) p
Automated Revision of Existing Programs 9
Preliminary Concepts (cont)
Sample Properties Safety
P unless Q
stable(P)
invariant(P)
Liveness P leads-to Q P Q
P PP P P P
PP P
P P P Q
Automated Revision of Existing Programs 10
Preliminary Concepts (cont)
A specification is a conjunction of a set of properties
spec = L1 L2 hellip Ln
A computation satisfies spec iff (i | 0 i n satisfies Li)
A program p satisfies spec iff all computations of p1 are infinite
2 satisfy spec
Automated Revision of Existing Programs 11
Problem Statement
Formulation of the problem p satisfies existing specification spece
Sp = Sp
Ip = Ip
p p
All computations of p are infinite satisfy specn
SynthesisAlgorithm
Program p = Sp Ip p
Program p = Sp Ip pA Specification specn
Automated Revision of Existing Programs 12
Adding a Single Leads-to Property (R T )
Sp
Ip R T
Case 1 Deadlock statesRemove transitions (s1 s2) if s2 R and T is not reachable from S2
s0
s1
s2
Automated Revision of Existing Programs 13
Adding a Single Leads-to Property (R T )
Sp Ip R T
Break cycles reachable from R without reaching Q
s1
s4
s2
s3
Case 2 Cycles
s0
Automated Revision of Existing Programs 14
Soundness and Completeness
Theorem (1) The algorithm for adding multiple safety properties along with a leads-to property is sound and complete Fixability
Theorem (2) The complexity of the algorithm for adding multiple safety properties along with a leads-to property is polynomial-time
Automated Revision of Existing Programs 15
Adding Two Leads-to Properties Adding two leads-to properties one after another
Sp
Ip R T
s5
s6
s3s4
s7
P Qs9
s0
s6
s1
s2
s8
Automated Revision of Existing Programs 16
Other Results
The problems of simultaneous addition of two leads-to properties is NP-complete
The problem of adding one leads-to property while maintaining maximum non-determinism is NP-complete
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 6
Our Goal
We identify classes of interesting properties typically used in specifying reactive systems
Designing synthesis methods where revising existing programs is feasible time-wise and space-wise
QuestionQuestion Why comprehensive revision is highly Why comprehensive revision is highly complexcomplexAnswerAnswer Expressiveness of specifications Expressiveness of specifications
Automated Revision of Existing Programs 7
Part I
Adding PropertiesProperties to Existing Programs
Automated Revision of Existing Programs 8
Preliminary Concepts
A program p is a triple p = Sp Ip p ie finite state space set of initial states and program transitions
A state predicate is any subset of Sp
A computation is a state sequence s0 s1 hellip iff s0 Ip
i gt 0 (si-1 si) p
If terminates in state sf then there does not exist s such that (sf s) p
Automated Revision of Existing Programs 9
Preliminary Concepts (cont)
Sample Properties Safety
P unless Q
stable(P)
invariant(P)
Liveness P leads-to Q P Q
P PP P P P
PP P
P P P Q
Automated Revision of Existing Programs 10
Preliminary Concepts (cont)
A specification is a conjunction of a set of properties
spec = L1 L2 hellip Ln
A computation satisfies spec iff (i | 0 i n satisfies Li)
A program p satisfies spec iff all computations of p1 are infinite
2 satisfy spec
Automated Revision of Existing Programs 11
Problem Statement
Formulation of the problem p satisfies existing specification spece
Sp = Sp
Ip = Ip
p p
All computations of p are infinite satisfy specn
SynthesisAlgorithm
Program p = Sp Ip p
Program p = Sp Ip pA Specification specn
Automated Revision of Existing Programs 12
Adding a Single Leads-to Property (R T )
Sp
Ip R T
Case 1 Deadlock statesRemove transitions (s1 s2) if s2 R and T is not reachable from S2
s0
s1
s2
Automated Revision of Existing Programs 13
Adding a Single Leads-to Property (R T )
Sp Ip R T
Break cycles reachable from R without reaching Q
s1
s4
s2
s3
Case 2 Cycles
s0
Automated Revision of Existing Programs 14
Soundness and Completeness
Theorem (1) The algorithm for adding multiple safety properties along with a leads-to property is sound and complete Fixability
Theorem (2) The complexity of the algorithm for adding multiple safety properties along with a leads-to property is polynomial-time
Automated Revision of Existing Programs 15
Adding Two Leads-to Properties Adding two leads-to properties one after another
Sp
Ip R T
s5
s6
s3s4
s7
P Qs9
s0
s6
s1
s2
s8
Automated Revision of Existing Programs 16
Other Results
The problems of simultaneous addition of two leads-to properties is NP-complete
The problem of adding one leads-to property while maintaining maximum non-determinism is NP-complete
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 7
Part I
Adding PropertiesProperties to Existing Programs
Automated Revision of Existing Programs 8
Preliminary Concepts
A program p is a triple p = Sp Ip p ie finite state space set of initial states and program transitions
A state predicate is any subset of Sp
A computation is a state sequence s0 s1 hellip iff s0 Ip
i gt 0 (si-1 si) p
If terminates in state sf then there does not exist s such that (sf s) p
Automated Revision of Existing Programs 9
Preliminary Concepts (cont)
Sample Properties Safety
P unless Q
stable(P)
invariant(P)
Liveness P leads-to Q P Q
P PP P P P
PP P
P P P Q
Automated Revision of Existing Programs 10
Preliminary Concepts (cont)
A specification is a conjunction of a set of properties
spec = L1 L2 hellip Ln
A computation satisfies spec iff (i | 0 i n satisfies Li)
A program p satisfies spec iff all computations of p1 are infinite
2 satisfy spec
Automated Revision of Existing Programs 11
Problem Statement
Formulation of the problem p satisfies existing specification spece
Sp = Sp
Ip = Ip
p p
All computations of p are infinite satisfy specn
SynthesisAlgorithm
Program p = Sp Ip p
Program p = Sp Ip pA Specification specn
Automated Revision of Existing Programs 12
Adding a Single Leads-to Property (R T )
Sp
Ip R T
Case 1 Deadlock statesRemove transitions (s1 s2) if s2 R and T is not reachable from S2
s0
s1
s2
Automated Revision of Existing Programs 13
Adding a Single Leads-to Property (R T )
Sp Ip R T
Break cycles reachable from R without reaching Q
s1
s4
s2
s3
Case 2 Cycles
s0
Automated Revision of Existing Programs 14
Soundness and Completeness
Theorem (1) The algorithm for adding multiple safety properties along with a leads-to property is sound and complete Fixability
Theorem (2) The complexity of the algorithm for adding multiple safety properties along with a leads-to property is polynomial-time
Automated Revision of Existing Programs 15
Adding Two Leads-to Properties Adding two leads-to properties one after another
Sp
Ip R T
s5
s6
s3s4
s7
P Qs9
s0
s6
s1
s2
s8
Automated Revision of Existing Programs 16
Other Results
The problems of simultaneous addition of two leads-to properties is NP-complete
The problem of adding one leads-to property while maintaining maximum non-determinism is NP-complete
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 8
Preliminary Concepts
A program p is a triple p = Sp Ip p ie finite state space set of initial states and program transitions
A state predicate is any subset of Sp
A computation is a state sequence s0 s1 hellip iff s0 Ip
i gt 0 (si-1 si) p
If terminates in state sf then there does not exist s such that (sf s) p
Automated Revision of Existing Programs 9
Preliminary Concepts (cont)
Sample Properties Safety
P unless Q
stable(P)
invariant(P)
Liveness P leads-to Q P Q
P PP P P P
PP P
P P P Q
Automated Revision of Existing Programs 10
Preliminary Concepts (cont)
A specification is a conjunction of a set of properties
spec = L1 L2 hellip Ln
A computation satisfies spec iff (i | 0 i n satisfies Li)
A program p satisfies spec iff all computations of p1 are infinite
2 satisfy spec
Automated Revision of Existing Programs 11
Problem Statement
Formulation of the problem p satisfies existing specification spece
Sp = Sp
Ip = Ip
p p
All computations of p are infinite satisfy specn
SynthesisAlgorithm
Program p = Sp Ip p
Program p = Sp Ip pA Specification specn
Automated Revision of Existing Programs 12
Adding a Single Leads-to Property (R T )
Sp
Ip R T
Case 1 Deadlock statesRemove transitions (s1 s2) if s2 R and T is not reachable from S2
s0
s1
s2
Automated Revision of Existing Programs 13
Adding a Single Leads-to Property (R T )
Sp Ip R T
Break cycles reachable from R without reaching Q
s1
s4
s2
s3
Case 2 Cycles
s0
Automated Revision of Existing Programs 14
Soundness and Completeness
Theorem (1) The algorithm for adding multiple safety properties along with a leads-to property is sound and complete Fixability
Theorem (2) The complexity of the algorithm for adding multiple safety properties along with a leads-to property is polynomial-time
Automated Revision of Existing Programs 15
Adding Two Leads-to Properties Adding two leads-to properties one after another
Sp
Ip R T
s5
s6
s3s4
s7
P Qs9
s0
s6
s1
s2
s8
Automated Revision of Existing Programs 16
Other Results
The problems of simultaneous addition of two leads-to properties is NP-complete
The problem of adding one leads-to property while maintaining maximum non-determinism is NP-complete
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 9
Preliminary Concepts (cont)
Sample Properties Safety
P unless Q
stable(P)
invariant(P)
Liveness P leads-to Q P Q
P PP P P P
PP P
P P P Q
Automated Revision of Existing Programs 10
Preliminary Concepts (cont)
A specification is a conjunction of a set of properties
spec = L1 L2 hellip Ln
A computation satisfies spec iff (i | 0 i n satisfies Li)
A program p satisfies spec iff all computations of p1 are infinite
2 satisfy spec
Automated Revision of Existing Programs 11
Problem Statement
Formulation of the problem p satisfies existing specification spece
Sp = Sp
Ip = Ip
p p
All computations of p are infinite satisfy specn
SynthesisAlgorithm
Program p = Sp Ip p
Program p = Sp Ip pA Specification specn
Automated Revision of Existing Programs 12
Adding a Single Leads-to Property (R T )
Sp
Ip R T
Case 1 Deadlock statesRemove transitions (s1 s2) if s2 R and T is not reachable from S2
s0
s1
s2
Automated Revision of Existing Programs 13
Adding a Single Leads-to Property (R T )
Sp Ip R T
Break cycles reachable from R without reaching Q
s1
s4
s2
s3
Case 2 Cycles
s0
Automated Revision of Existing Programs 14
Soundness and Completeness
Theorem (1) The algorithm for adding multiple safety properties along with a leads-to property is sound and complete Fixability
Theorem (2) The complexity of the algorithm for adding multiple safety properties along with a leads-to property is polynomial-time
Automated Revision of Existing Programs 15
Adding Two Leads-to Properties Adding two leads-to properties one after another
Sp
Ip R T
s5
s6
s3s4
s7
P Qs9
s0
s6
s1
s2
s8
Automated Revision of Existing Programs 16
Other Results
The problems of simultaneous addition of two leads-to properties is NP-complete
The problem of adding one leads-to property while maintaining maximum non-determinism is NP-complete
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 10
Preliminary Concepts (cont)
A specification is a conjunction of a set of properties
spec = L1 L2 hellip Ln
A computation satisfies spec iff (i | 0 i n satisfies Li)
A program p satisfies spec iff all computations of p1 are infinite
2 satisfy spec
Automated Revision of Existing Programs 11
Problem Statement
Formulation of the problem p satisfies existing specification spece
Sp = Sp
Ip = Ip
p p
All computations of p are infinite satisfy specn
SynthesisAlgorithm
Program p = Sp Ip p
Program p = Sp Ip pA Specification specn
Automated Revision of Existing Programs 12
Adding a Single Leads-to Property (R T )
Sp
Ip R T
Case 1 Deadlock statesRemove transitions (s1 s2) if s2 R and T is not reachable from S2
s0
s1
s2
Automated Revision of Existing Programs 13
Adding a Single Leads-to Property (R T )
Sp Ip R T
Break cycles reachable from R without reaching Q
s1
s4
s2
s3
Case 2 Cycles
s0
Automated Revision of Existing Programs 14
Soundness and Completeness
Theorem (1) The algorithm for adding multiple safety properties along with a leads-to property is sound and complete Fixability
Theorem (2) The complexity of the algorithm for adding multiple safety properties along with a leads-to property is polynomial-time
Automated Revision of Existing Programs 15
Adding Two Leads-to Properties Adding two leads-to properties one after another
Sp
Ip R T
s5
s6
s3s4
s7
P Qs9
s0
s6
s1
s2
s8
Automated Revision of Existing Programs 16
Other Results
The problems of simultaneous addition of two leads-to properties is NP-complete
The problem of adding one leads-to property while maintaining maximum non-determinism is NP-complete
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 11
Problem Statement
Formulation of the problem p satisfies existing specification spece
Sp = Sp
Ip = Ip
p p
All computations of p are infinite satisfy specn
SynthesisAlgorithm
Program p = Sp Ip p
Program p = Sp Ip pA Specification specn
Automated Revision of Existing Programs 12
Adding a Single Leads-to Property (R T )
Sp
Ip R T
Case 1 Deadlock statesRemove transitions (s1 s2) if s2 R and T is not reachable from S2
s0
s1
s2
Automated Revision of Existing Programs 13
Adding a Single Leads-to Property (R T )
Sp Ip R T
Break cycles reachable from R without reaching Q
s1
s4
s2
s3
Case 2 Cycles
s0
Automated Revision of Existing Programs 14
Soundness and Completeness
Theorem (1) The algorithm for adding multiple safety properties along with a leads-to property is sound and complete Fixability
Theorem (2) The complexity of the algorithm for adding multiple safety properties along with a leads-to property is polynomial-time
Automated Revision of Existing Programs 15
Adding Two Leads-to Properties Adding two leads-to properties one after another
Sp
Ip R T
s5
s6
s3s4
s7
P Qs9
s0
s6
s1
s2
s8
Automated Revision of Existing Programs 16
Other Results
The problems of simultaneous addition of two leads-to properties is NP-complete
The problem of adding one leads-to property while maintaining maximum non-determinism is NP-complete
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 12
Adding a Single Leads-to Property (R T )
Sp
Ip R T
Case 1 Deadlock statesRemove transitions (s1 s2) if s2 R and T is not reachable from S2
s0
s1
s2
Automated Revision of Existing Programs 13
Adding a Single Leads-to Property (R T )
Sp Ip R T
Break cycles reachable from R without reaching Q
s1
s4
s2
s3
Case 2 Cycles
s0
Automated Revision of Existing Programs 14
Soundness and Completeness
Theorem (1) The algorithm for adding multiple safety properties along with a leads-to property is sound and complete Fixability
Theorem (2) The complexity of the algorithm for adding multiple safety properties along with a leads-to property is polynomial-time
Automated Revision of Existing Programs 15
Adding Two Leads-to Properties Adding two leads-to properties one after another
Sp
Ip R T
s5
s6
s3s4
s7
P Qs9
s0
s6
s1
s2
s8
Automated Revision of Existing Programs 16
Other Results
The problems of simultaneous addition of two leads-to properties is NP-complete
The problem of adding one leads-to property while maintaining maximum non-determinism is NP-complete
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 13
Adding a Single Leads-to Property (R T )
Sp Ip R T
Break cycles reachable from R without reaching Q
s1
s4
s2
s3
Case 2 Cycles
s0
Automated Revision of Existing Programs 14
Soundness and Completeness
Theorem (1) The algorithm for adding multiple safety properties along with a leads-to property is sound and complete Fixability
Theorem (2) The complexity of the algorithm for adding multiple safety properties along with a leads-to property is polynomial-time
Automated Revision of Existing Programs 15
Adding Two Leads-to Properties Adding two leads-to properties one after another
Sp
Ip R T
s5
s6
s3s4
s7
P Qs9
s0
s6
s1
s2
s8
Automated Revision of Existing Programs 16
Other Results
The problems of simultaneous addition of two leads-to properties is NP-complete
The problem of adding one leads-to property while maintaining maximum non-determinism is NP-complete
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 14
Soundness and Completeness
Theorem (1) The algorithm for adding multiple safety properties along with a leads-to property is sound and complete Fixability
Theorem (2) The complexity of the algorithm for adding multiple safety properties along with a leads-to property is polynomial-time
Automated Revision of Existing Programs 15
Adding Two Leads-to Properties Adding two leads-to properties one after another
Sp
Ip R T
s5
s6
s3s4
s7
P Qs9
s0
s6
s1
s2
s8
Automated Revision of Existing Programs 16
Other Results
The problems of simultaneous addition of two leads-to properties is NP-complete
The problem of adding one leads-to property while maintaining maximum non-determinism is NP-complete
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 15
Adding Two Leads-to Properties Adding two leads-to properties one after another
Sp
Ip R T
s5
s6
s3s4
s7
P Qs9
s0
s6
s1
s2
s8
Automated Revision of Existing Programs 16
Other Results
The problems of simultaneous addition of two leads-to properties is NP-complete
The problem of adding one leads-to property while maintaining maximum non-determinism is NP-complete
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 16
Other Results
The problems of simultaneous addition of two leads-to properties is NP-complete
The problem of adding one leads-to property while maintaining maximum non-determinism is NP-complete
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 17
Another Problem
Adding two eventually properties
1 true leads-to Q
2 true leads-to T
This problem is also NP-complete
Q
T
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 18
Example Real-Time Resource Allocation
Two processes j 1 2 Each has two tasks to complete (each takes 1 time unit)
Submitting a request Performing IO operation
RQj reqj (x = 1) ioj reqj = true false
IOj ioj (x = 1) reqj ioj = true false WT 0 x 1 1048576 wait
Bounded response
L (io1 2 req1)
x
x
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 19
req10 lt x t lt 1
req2
req1x t = 0req2
req1x t = 1req2
io1x t = 0 req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
io10 lt x t lt 1
req2
io1x t = 0
io2
io10 lt x t lt 1
io2
io1x = 0 t gt 2
req2
io1x = 1 t gt 2
req2
io1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2req2
req1x = 0 t gt 2
req2
req1x = 1 t gt 2
req2
io10 lt x lt 1
t gt 2io2
req1x = 0 t gt 2
io2
req10 lt x lt 1
t gt 2 io2
req1x = 1 t gt 2
io2
io1x t = 0 2
io2
req1x t = 0 2
req2
req1x t = 0 1
io2
req10 lt x lt 11 lt t lt 2
io2
req1x t = 1 2
io2
req1x t = 0 2
io2
req1x t = 0 1
req2
req10 lt x lt 11 lt t lt 2
req2
req1x t = 1 2
req2
Regions whereio1 becomes true
Edges removed during addition
Regions made unreachable
Legend
Initial region
Edges participatingin a shortest path
io1x t = 0 1
req2
io1x t = 1
io2
io10 lt x lt 11 lt t lt 2
req2
io1x t = 1 2
req2
Edges in additionalshortest paths
io1x t = 1req2
io1x t = 01
io2
io10 lt x lt 11 lt t lt 2
io2
io1x t = 1 2
io2
io1x t = 0 2
req2
io1x = 1 t gt 2
io2
io10 lt x lt 1
t gt 2req2
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 20
Example (contrsquod)
RQ1 req1 (x = 1) io1 req1 = true false
IO1 io1 (x = 1) req1 io1 = true false
RQ2 req2 (x = 1) (io1 t 1) io2 req2 = true false
IO1 io2 (x = 1) (io1 t 1) req2 io2 = true false
WT 0 x 1 1048576 wait
x t
x
x
x
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 21
Part II
Adding Fault-ToleranceFault-Tolerance to Existing Programs
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 22
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Decision dg 0 1
(dj = ) ( fj = false) dj = dg
(dj ) ( fj = false) fj = true
dj
dk 0 1
dl
Decision
fj
fk false true
fl
Final
GENERAL
NON-GENERALS
Program
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 23
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Byzantine
bg false true
bj
bk false true
bl
Byzantine
(bj bk bl bg = false) bj = true
(bj = true) dj = 0|1Faults
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 24
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
Safety specification Agreement The final decision of any two non-Byzantine
process cannot be different
Validity If the general is non-Byzantine then the final decision of a non-Byzantine process must be the same as that of the general
Fact The program does not meet the safety specification in the presence of faults
dg = 0 bg = true dj = 0 fj = false fj = true dk = 1 fk = true
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 25
The Byzantine Agreement ProblemThe Byzantine Agreement Problem
QuestionIs it possible to revise a distributed program in order to add fault-tolerance to the original program with respect to a safety specification and a class of faults
AnswerYes but the problem is NP-complete which itself is a problem
SolutionA set of polynomial-time heuristics has been developed to add fault-tolerance to a large class of distributed programs
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 26
Some Terminalogy
Boolean Variables V = v1 vn State where l(vj) is a literal State predicate A finite set of states Transition Let V = v | v V A transition (s s) = s s Transition predicate is a set of transitions Program A program is defined by a transition predicate P Closure A state predicate S is closed in program P iff
(s ||S) (s ||S)
Computation of P A sequence of states s0 s1 where (sj sj+1) || P Safety specification A set of bad transitions that should not occur in any
program computation defined by transition predicate SPEC Satisfaction A program P satisfies SPEC from S iff
1 S is closed in P
2 For all s0 s1 where s0 || S (sj sj+1) || SPEC Invariant If P satisfies SPEC from S and S then S is an invariant of P
imi sS 0
)(0 jnj vls
)(0 iimi ssP
Pss ||)(
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 27
Preliminaries (contrsquod)
Faults A set F of transitions
Fault-span A state predicate T such that
1 S T
2 T is closed in P F
Invariant
f
f
ff
f
Fault-Span
Finite state space
p
pp
p
Fault-tolerance A program P is F-tolerant to SPEC from S if1 P satisfies SPEC from S2 There exists T such that
T is an F-span of P P F satisfies SPEC from T (safety) after faults stop occurring every computation of P reaches S (liveness)
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 28
Levels of Fault-Tolerance [3]
In the presence of faults
A failsafe program satisfies the safety specification A nonmasking program satisfies the liveness specification A masking program satisfies both safety and liveness specifications
In the absence of faults the program must continue to satisfy its entire specification
Question Are these levels able to capture the requirements for modeling fault-tolerant real-time programs
[3] A Arora S Kulkarni [3] A Arora S Kulkarni Designing masking fault-tolerance via nonmasking fault-toleranceDesigning masking fault-tolerance via nonmasking fault-tolerance TSE 1998TSE 1998
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 29
Fault-Tolerant Real-Time Programs
Railroad Crossing Safety If the train is passing the crossing the gate must be closed Bounded response Once the gate is closed it should reopen within
4min Fault-tolerance The gate must be closed while the train is passing the
intersection even in the presence of faults
Boiler Controller Bounded response Once the pressure gauge reads 30 pounds per
square inch the controller must issue a command to open a valve within 20s
Fault-tolerance Faults should not affect the functionality of the controller
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 30
New Levels of Fault-Tolerance for RT Programs
Soft Fault-Tolerance
A program is soft fault-tolerant if it is NOT required to satisfy its timing constraints in the presence of faults However it should continue to meet its timing constraints in the absence of faults
Hard Fault-Tolerance
A program is hard fault-tolerant if it is required to satisfy its timing constraints even in the presence of faults
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 31
Levels of Fault-Tolerance (contrsquod)
Safety LivenessTiming
constraintsBounded-time
recovery
Soft-failsafe
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 32
Problem Statement
SynthesisAlgorithm
Program PSafety spec
Fault-tolerant
program PSet of faults f
Desired level of fault-tolerance SoftHard-Failsafe
NonmaskingSoftHard-Masking
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 33
Problem Statement
Given program P Invariant S a set of faults F and safety specification SPEC identify a program P with invariant S such that
S S P P and P is F-tolerant to SPEC from S
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 34
Current Results
Safety LivenessTiming
constraintsBounded-recovery
Complexity
Soft-failsafe
Polynomial space sound and complete
algorithm
Hard-failsafe
Nonmasking
Soft-masking
Hard-masking
Open problem
[2] S Kulkarni A Arora [2] S Kulkarni A Arora Automating the Addition of Fault-ToleranceAutomating the Addition of Fault-Tolerance FTRTFT 2000 FTRTFT 2000
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 35
Example Altitude Switch
Initx 06
Standby
Await (Power z 2 )
gtgt Actuator Power on ltlt
ltlt LowAlt gtgt[z = 0]
ltlt Reset gtgt(CorruptSensor y 2)
(x 06)
ltlt Reset gtgt
[x y z = 0]
(Power z 2 )
(z gt 2 )[t = 0]
(x gt 06)[t = 0]
ltlt SensorFail gtgt[y = 0]
(y gt 2)[t = 0]
y 1y 2
z 2
(Power CorruptSensor)( Power)
Fault-span
Fault-span
(t 1)
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
The Altitude Switch is required to
1) HFS Tolerate the delay fault while in Standby mode
((Standby CorruptSensor) lozloz22 (Init))
2) HFS Not power on the actuators while it fails to read the altitude sensors
3) SMK Recover within 1s after occurrence of faults
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 37
Ongoing Research
Implementation of the algorithms
Coping with the state explosion problem Synthesis using zone automata Parallelizing state space
Coping with NP-hardness results Heuristics Decision procedures (Yices)
Fault-tolerance in hybrid systems
[8] httpwwwcsemsueduebnenasiresearchtoolsftsynhtm
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions
Automated Revision of Existing Programs 38
Conclusion
We study the problem of automated revision of programs inside their state space
Adding properties to programs Adding fault-tolerance
Automated Revision of Existing Programs 39
Questions