CTL Model Checking in Database Cloud

4
CTL Model Checking in Database Cloud German Shegalov Oracle Corp. 500 Oracle Parkway, Redwood Shores, CA 94065, USA [email protected] Abstract— Modern software systems such as OS, RDBMS, JVM, etc have reached enormous complexity by any metric ranging from the number of lines of code to the program state explosion due to concurrency. Standard quality assurance methods do not yield strong correctness guarantees because even 100% code cov- erage – while a desirable metric – is not equivalent to the state/execution path coverage. Whereas model checking provides rigor correctness proofs, its computational complexity is often prohibitive for real world systems. With advances of distributed computing frameworks such as MapReduce, and affordability of large computer clusters (e.g., offered as an on-demand Cloud ser- vice), steadily larger systems can be verified using model check- ing. In this paper, we envision database vendors compete for achieving the highest possible degree of verification using massive scalability features. To this end, we show a way of imple- menting a CTL model checker as an SQL application that a data- base system will “tune” for the cloud. I. INTRODUCTION In this paper, we demonstrate a relatively simple way to turn a relational database system in a powerful verification tool. We show only a very basic technique of implementing a model checker inside a database system. The idea is to en- courage database vendors to compete not only on perform- ance-oriented benchmark but on merits of objective software quality as well. An interesting side effect of this is that the database system will be able to verify its own concurrency control and recovery protocols. Several steps are required on the way towards software verification. First the source code has to be converted into some abstract state transition model using e.g., the ETL func- tionality. This is already a complex problem because finite ab- stractions for things like recursion and heap allocations need to be found. In this paper, we assume that some technology as in Spin model checker is used to this end. Then each compon- ent architect will formulate safety and liveness properties in temporal logics that can be verified by the system as the final step. These tasks by themselves should already embody a sub- stantial stress test of the database system itself. Often, the source code is generated from state diagrams for protocols, or grammars, and these specifications can be used directly in- stead. In this paper, we focus on implementing a Model Checking Engine as an SQL application. The idea here is to use SQL as scalability vehicle for massive-parallel model checking. Dia- lects of SQL, the lingua franca, of most relational databases are already a very powerful language that found its interesting usages beyond the traditional OLTP and OLAP scopes, e.g., to solve puzzles [7]. And when the SQL's expressiveness is not sufficient, we can resort to an efficient database-system-integ- rated virtual machine that will close the gap for any Tur- ing-computable problem. The task of a model checker is to verify whether the system under test satisfies a property posed as a formulae in temporal logics such as CTL. II. CTL BACKGROUND Model checking is a formal method of software/hardware verification – an automated way of providing mathematical proofs [1]. In this paper, we deal with the Computational Tree Logic (CTL) [3] model checking. Along with the traditional Boolean operators, CTL defines the existential path quantifier E and the universal path quanti- fier A for the paths originating in some state s. Temporal as- pects are expressed using the unary modalities neXt (refers to successor state), Globally (all reachable states satisfy the for- mula), Finally (a reachable state satisfies the formula), and the binary modality Until (the left-hand formula is valid at least until a state is reached where the right-hand formula holds). The unary modalities are usually most relevant in the praxis. The set of CTL formulae over a finite set of atomic propos- itions P, denoted as CTL(P), is formally defined as follows us- ing the structural induction: p P implies p CTL(P) {p, q} CTL(P) implies {¬p, p q, EX p, E (p U q), A (p U q)} CTL(P) Given basic formulae defined above, the following short- hand syntax is provided as equivalent to formulae in the basic set: p q ¬(¬p ¬q) AX p ¬EX ¬p AF p A (true U p) EF p E (true U p) AG p ¬E (true U ¬p) EG p ¬A (true U ¬p) The CTL presumes that a computing system is represented as a Kripke structure K = (S, R, L), where S is the finite set of states, R S × S is the state transition relation with (s, t) R if t is an immediate successor of s, and L: S × P {true, false}. A path is a potentially infinite sequence of successive states. In our toy example of Figure 1, AX P 0 , EG P 0 , EF P 1 , AG P 0 P 1 are true in S 0 Fig 1 A sample state-transition diagram with initial state S0 and two atomic fornulae P0 and P1. S 0 : P 0 S 1 : P 0 , P 1 S 1 : P 1

Transcript of CTL Model Checking in Database Cloud

Page 1: CTL Model Checking in Database Cloud

CTL Model Checking in Database CloudGerman Shegalov

Oracle Corp.500 Oracle Parkway, Redwood Shores, CA 94065, USA

[email protected]

Abstract— Modern software systems such as OS, RDBMS, JVM, etc have reached enormous complexity by any metric ranging from the number of lines of code to the program state explosion due to concurrency. Standard quality assurance methods do not yield strong correctness guarantees because even 100% code cov-erage – while a desirable metric – is not equivalent to the state/execution path coverage. Whereas model checking provides rigor correctness proofs, its computational complexity is often prohibitive for real world systems. With advances of distributed computing frameworks such as MapReduce, and affordability of large computer clusters (e.g., offered as an on-demand Cloud ser-vice), steadily larger systems can be verified using model check-ing. In this paper, we envision database vendors compete for achieving the highest possible degree of verification using massive scalability features. To this end, we show a way of imple-menting a CTL model checker as an SQL application that a data-base system will “tune” for the cloud.

I. INTRODUCTION

In this paper, we demonstrate a relatively simple way to turn a relational database system in a powerful verification tool. We show only a very basic technique of implementing a model checker inside a database system. The idea is to en-courage database vendors to compete not only on perform-ance-oriented benchmark but on merits of objective software quality as well. An interesting side effect of this is that the database system will be able to verify its own concurrency control and recovery protocols.

Several steps are required on the way towards software verification. First the source code has to be converted into some abstract state transition model using e.g., the ETL func-tionality. This is already a complex problem because finite ab-stractions for things like recursion and heap allocations need to be found. In this paper, we assume that some technology as in Spin model checker is used to this end. Then each compon-ent architect will formulate safety and liveness properties in temporal logics that can be verified by the system as the final step. These tasks by themselves should already embody a sub-stantial stress test of the database system itself. Often, the source code is generated from state diagrams for protocols, or grammars, and these specifications can be used directly in-stead.

In this paper, we focus on implementing a Model Checking Engine as an SQL application. The idea here is to use SQL as scalability vehicle for massive-parallel model checking. Dia-lects of SQL, the lingua franca, of most relational databases are already a very powerful language that found its interesting usages beyond the traditional OLTP and OLAP scopes, e.g., to solve puzzles [7]. And when the SQL's expressiveness is not sufficient, we can resort to an efficient database-system-integ-

rated virtual machine that will close the gap for any Tur-ing-computable problem. The task of a model checker is to verify whether the system under test satisfies a property posed as a formulae in temporal logics such as CTL.

II. CTL BACKGROUND

Model checking is a formal method of software/hardware verification – an automated way of providing mathematical proofs [1]. In this paper, we deal with the Computational Tree Logic (CTL) [3] model checking.

Along with the traditional Boolean operators, CTL defines the existential path quantifier E and the universal path quanti-fier A for the paths originating in some state s. Temporal as-pects are expressed using the unary modalities neXt (refers to successor state), Globally (all reachable states satisfy the for-mula), Finally (a reachable state satisfies the formula), and the binary modality Until (the left-hand formula is valid at least until a state is reached where the right-hand formula holds). The unary modalities are usually most relevant in the praxis.

The set of CTL formulae over a finite set of atomic propos-itions P, denoted as CTL(P), is formally defined as follows us-ing the structural induction:

p ∈ P implies p ∈ CTL(P){p, q} ⊆ CTL(P) implies {¬p, p ���∧ q, EX p, E (p U q), A (p

U q)} ⊆ CTL(P)Given basic formulae defined above, the following short-

hand syntax is provided as equivalent to formulae in the basic set:

p ∨ q ≡ ¬(¬p ���∧ ¬q)AX p ≡ ¬EX ¬pAF p ≡ A (true U p) EF p ≡ E (true U p)AG p ≡ ¬E (true U ¬p)EG p ≡ ¬A (true U ¬p)

The CTL presumes that a computing system is represented as a Kripke structure K = (S, R, L), where S is the finite set of states, R ⊆ S × S is the state transition relation with (s, t) ∈R if t is an immediate successor of s, and L: S × P →{true, false}. A path is a potentially infinite sequence of successive states. In our toy example of Figure 1, AX P0, EG P0, EF P1, AG P0∨P1 are true in S0

Fig 1 A sample state-transition diagram with initial state S0 and two atomic fornulae P0 and P1.

S0: P

0S

1: P

0, P

1 S1: P

1

Page 2: CTL Model Checking in Database Cloud

III. KRIPKE SCHEMA

In this section we present a way of translating the Kripke structure using database relations. The state transition diagram of Figure 1 translates to the instance of the Kripke schema outlined in Table 1. As we incorporated id's into the names in this example, we focus solely on non-trivial relations valu-ation and transition. A negation not P is implied when the atomic proposition P is not shown in the state and analogously when there is no corresponding entry in the valuation relation. The tuple (null, 0) in the transition relation specifies that the state with s_id = 0 is the initial state in this state transition sys-tem.

create table state ( s_id number primary key, s_nm varchar2(10));insert into state values(0, 'S_0');insert into state values(1, 'S_1');insert into state values(2, 'S_2');

create table atomic ( a_id number primary key, a_nm varchar2(10));insert into atomic values(0, 'P_0');insert into atomic values(1, 'P_1');

create table valuation ( s_id number references state(s_id), a_id number references atomic(a_id));insert into valuation values(0, 0);insert into valuation values(1, 0);insert into valuation values(1, 1);insert into valuation values(2, 1);

create table transition ( src_id number references state(s_id), tgt_id number references state(s_id));insert into transition values(null, 1);insert into transition values(0, 1);insert into transition values(1, 0);insert into transition values(1, 2);

valuation s_id a_id0 01 01 12 1

transition src_id tgt_idnull 00 11 01 2

Fig 2 A sample relational representation of Kripke structure of Fig 1.

IV. MODEL CHECKER AS AN SQL APPLICATION

In this section we translate basic CTL formulae into execut-able SQL queries as an implementation of the basic explicit model checking algorithm [1]. For a p ∈ CTL(P) and s ∈ S let sql(state_id, p) denote an SQL statement that returns 'TRUE'

for the attribute 'RESULT' if s satisfies p or 'FALSE' otherwise. The SQL statements given below are written in Oracle 11gR2's SQL [6] as close as possible to ANSI SQL and are just meant to give the reader a flavor of the idea; we claim neither their particular elegance nor efficiency.

The algorithm of constructing a SQL representation sql(state_id, p) of CTL is given using the structural induction over CTL(P).

A. sql(state_id, 'FALSE')This formula cannot be satisfied when the state with

state_id exists.

select case count (*) when 0 then NULL else 'FALSE' end as resultfrom statewhere state.s_id = state_id;

B. sql(state_id, 'TRUE')This formula is always satisfied when the state with

state_id exists.

select case count (*) when 0 then NULL else 'TRUE' end as resultfrom statewhere state.s_id = state_id;

C. sql(state_id, atomic_id)An atomic propositional formula is satisfied when there is a

tuple (state_id, atomic_id) in the relation atomic.

select case count (*) when 0 then NULL else 'TRUE' end as resultfrom valuationwhere valuation.a_id = atomic_id and valuation.s_id = state_id;

D. sql(state_id, ¬p) The negation of p satisfied when p is false.

with subq as ( <sql(state_id, p)>)select case subq.result when 'TRUE' then 'FALSE' else 'TRUE' end as resultfrom subq;

E. sql(state_id, p ���∧ q)The conjunction is satisfied when both p and q are satisfied.

with subq_p as ( <sql(state_id, p)>), subq_q as ( <sql(state_id, q)>)select case count(*) when 0 then 'FALSE' else 'TRUE' end as result

Page 3: CTL Model Checking in Database Cloud

from subq_p natural join subq_qwhere result = 'TRUE';

F. sql(state_id, EX p)The disjunction is satisfied when state state_id is in the set

of predecessors of states satisfying p.

select case count(*) when 0 then 'FALSE' else 'TRUE' end as resultfrom transition twhere t.src_id = state_id and 'TRUE' = (<sql(t.tgt_id, p)>);

G. sql(state_id, E (p U q)) This formula is satisfied when state_id is in the set of states

satisfying q or state_id is reachable through recursive reverse traversal from the set of states already known to satisfy the formula. In each recursive step we add states that satisfy p. Since the state transition diagram may be cyclic, we use the cycle detection clause.

with subq_EpUq (rs) as ( select s_id as rs from state where 'TRUE' = (<sql(s_id, q)>)union all select t.src_id as rs from subq_EpUq join transition t on ( subq_EpUq.rs = t.tgt_id and 'TRUE' = (<sql(t.src_id, p)>) ))cycle rs set is_cycle to 'y' default 'n'select case count(*) when 0 then 'FALSE' else 'TRUE' end as resultfrom EpUqwhere rs = state_id;

H. sql(state_id, A (p U q)) This formula is computed similarly to the existentially

quantified formula above with the difference that in every re-cursive step we make sure to not add states that have at least one successor that is not in the result set of the previous step. Hence, more than one reference to the result set computed in the previous recursion step: predecessor computation and the check whether all successors of the predecessor are in the pre-vious set already. Therefore, this formula cannot be computed with the plain recursive SQL as above. Instead we develop a PL/SQL stored function and use a temporary table to achieve the desired behavior.

drop table temp;create global temporary table temp ( rs number primary key)on commit preserve rows;

create or replace function ApUq()return varchar2as pragma autonomous_transaction; counter number; newstates number;

begin insert into temp (<sql(state_id, q)>); commit; -- autonomous transaction

select count(*) into counter from temp where temp.rs = s_id; if counter <> 0 then return 'TRUE'; end if; loop newstates := 0; for r1 in ( select t1.src_id from temp join transition t1 on ( temp.rs = t1.tgt_id and 'TRUE' = (<sql(t1.src_id, p)>) ) ) loop select count(*) into counter from transition t2 where t2.src_id = r1.src_id and t2.tgt_id not in ( select * from temp tt3 ); if counter = 0 then if r1.src_id = s_id then return 'TRUE'; else begin insert into temp values (r1.src_id); commit; -- autonomous transaction newstates := newstates + 1; exception when dup_val_on_index then dbms_output.put_line( 'ignored duplicate'); end; end if; end if; end loop; if newstates = 0 then return 'FALSE'; end if; end loop;end;select ApUq() as result fromdual;

This sample implementation can be further optimized at different levels. From the model checking perspective, the ba-sic explicit algorithm is known to be outperformed by the symbolic model checking [4] using OBDD-encoded Boolean functions [5]. From the database perspective, we would start looking at using the horizontal scalability features such as Par-allel Pipelined Table Functions (PTF) in case of Oracle [6], or similar techniques such as MapReduce [2] depending on the vendor's functionality. As you notice in this section the queries implementing a composite CTL formula might consist of many subqueries that can be run in parallel. Many existential queries will benefit from the ability to stream the query hits early before the whole result set is formed as can be done with PTF.

V. BENCHMARK PROPOSALS

In terms of self-verification it might be difficult to devise a vendor-independent metric for the model checking bench-mark. One such metric could be the percentage of the source

Page 4: CTL Model Checking in Database Cloud

code verified given a set of the CTL propositions that apply to all products.

Fortunately, it is much easier to design an apple-to-apple benchmark if the verified system is a third-party product. We suggest that a substantial open-source project at the scale of Linux or MySQL is used as the system under verification.

As an example of properties we want to verify, consider two-phase locking (2PL) where there are distinct lock acquisi-tion and release phases for a transaction. With the event of lock acquisition/release by a transaction t encoded as t_acq and t_rel, accordingly we can state:

AG(¬t_rel ∨ AX(AG ¬t_acq))

We envision the following benchmark metrics: • The fraction of the source code verified• The fraction of the formulae verified• The monetary cost of the setup needed for verifica-

tion• The amount energy spent per verification per source

code line

VI. CONCLUSION

This paper advocates spending recent scalability gains in modern computing on finding rare and corner-case bugs in database systems to improve their quality by means of fully automated model checking. The fact that model checking can

be implemented using the database system itself also presents an interesting test case in terms of traditional software testing.

Further, we show a sample implementation of the basic ex-plicit model checking algorithm using the combination of Oracle 11.2 SQL and PL/SQL. Then we point out a couple of optimization areas where the vendors can work on excelling in this benchmark. Last but not least, we suggest several bench-mark metrics.

REFERENCES

[1] Clarke, E., Schlinghoff, B.: Model Checking, in Handbook of Automated Reasoning, Volume 2, Elsevier and MIT, 1635-1790 (2001)

[2] Dean, J., Ghemawat, S.: Symposium on Operating System Design and Implementation (OSDI), San Francisco, CA (2004).

[3] Emerson, E.: Temporal and Modal Logic, in Handbook of Theoretical Computer Science, Volume B: Formal Models and Semantics, Elsevier and MIT, 995-1072 (1990)

[4] McMillan, K.: Symbolic Model Checking, Kluwer , Norwell, MA (1993)

[5] Meinel, C., Theobald T.: Algorithms and Data Structures in VLSI Design OBDD Foundations and Applications, Springer, Heidelberg, (1998)

[6] Oracle Corp.: Oracle Database SQL Language Reference 11g Release 2 ( 1 1 . 2 ), http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/toc.htm

[7] Sheffer, A.: Oracle RDBMS 11gR2 – Solving a Sudoku using R e c u r s i v e S u b q u e r y F a c t o r i n g , http://technology.amis.nl/blog/6404/oracle-rdbms-11gr2-solving-a-sudoku-using-recursive-subquery-factoring