[IEEE 2010 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) - Leganes,...

9
Introducing Second-Order Spider Diagrams for Defining Regular Languages Peter Chapman and Gem Stapleton Visual Modelling Group, University of Brighton, UK {p.b.chapman, g.e.stapleton}@brighton.ac.uk Abstract There has been significant research effort focussed on the study of regular languages, since they play a vital role in our understanding of computation. This existing research draws a large number of connections with other areas, such as algebra and symbolic logic. Recently, research has begun into how diagrammatic logics can define regular languages, providing another mechanism through which we can under- stand regular languages. However, the formalised diagram- matic logics are first-order, so they cannot define non-star- free regular languages. The primary contributions of this paper are: (a) to develop and formalise a second-order di- agrammatic logic, extending spider diagrams of order, and (b) to establish a class of regular languages that this logic can define. This lays the essential foundations for provid- ing an exact classification of the regular languages that are definable using this new second-order logic. 1. Introduction There has been significant study of regular languages in computer science and much is known about how they re- late to finite automata, symbolic logic, and algebraic for- malisms. Each of these relationships has led to an increased understanding of regular languages, justifying that the study of formal language theory from different perspectives can be both insightful and highly significant. For instance, work by B ¨ uchi [1], amongst others, provides a logical characteri- zation of regular languages in terms of symbolic logic; star- free regular languages are first-order definable and those which are not star-free are second-order definable. We note that connections between regular languages and symbolic logic have been well-studied [8, 20]. Recent work has established how one can define regu- lar languages using the spider diagram logic [4], which we elucidate in the next section. Here, we note that spider dia- grams can only define so-called commutative star-free regu- lar languages (a commutative language is closed under per- mutation). This led to the development of spider diagrams of order [5] which augment spider diagrams with a prod- uct operator, , allowing ordering information to be speci- fied. However, since both of these logics are first-order, they can only define star-free regular languages. All regular lan- guages are definable in monadic second-order logic [1, 20]. Whilst the range of diagrammatic logics available is increasing, those which are formalised are typically lim- ited in expressiveness to at most that of first-order logic. Further examples include existential graphs [3], Euler di- agrams [16], Venn-II [14], Euler/Venn diagrams [18], and constraint diagrams [11]. The limitation of these logics to being first-order precludes the formalisation of many com- monly occurring concepts in both mathematics and com- puter science, such as defining the property of being finite. We propose second-order spider diagrams, extending spider diagrams of order by incorporating arrows (from con- straint diagrams [11]) and the ability to quantify over sets. Thus, the development of this logic pushes the boundaries of what can be expressed by formal diagrammatic nota- tions in visually intuitive ways. Until this paper, there has been no attempt to develop a diagrammatic logic that is capable of defining non-star-free regular languages. In §2 we describe approaches to defining regular languages using monadic first order logic, spider diagrams, and finite state machines. The limitations of using spider diagrams are dis- cussed, which motivates §3, where we introduce a notation for second-order spider diagrams. In §4 we define the lan- guage of a diagram and in § 5 we establish that any regu- lar language with at most star-height 1 can be defined by a second-order spider diagram. 2. Defining Regular Languages Using Logic Recall a (not necessarily regular) language over an al- phabet Σ is a subset of Σ (all the words formed from letters in Σ). A language is regular if it can be defined by a reg- ular expression [12]. Regular expressions are formed using letters in Σ, λ to denote the empty word, and to denote the empty language. They also make use of disjunction, de- noted |, concatenation, denoted ·, and the Kleene star, , to form complex expressions. In addition, generalised regu- 2010 IEEE Symposium on Visual Languages and Human-Centric Computing 978-0-7695-4206-5/10 $26.00 © 2010 IEEE DOI 159 2010 IEEE Symposium on Visual Languages and Human-Centric Computing 978-0-7695-4206-5/10 $26.00 © 2010 IEEE DOI 10.1109/VLHCC.2010.30 159

Transcript of [IEEE 2010 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) - Leganes,...

Page 1: [IEEE 2010 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) - Leganes, Madrid, Spain (2010.09.21-2010.09.25)] 2010 IEEE Symposium on Visual Languages and Human-Centric

Introducing Second-Order Spider Diagrams for Defining Regular Languages

Peter Chapman and Gem StapletonVisual Modelling Group, University of Brighton, UK

{p.b.chapman, g.e.stapleton}@brighton.ac.uk

Abstract

There has been significant research effort focussed on thestudy of regular languages, since they play a vital role in ourunderstanding of computation. This existing research drawsa large number of connections with other areas, such asalgebra and symbolic logic. Recently, research has beguninto how diagrammatic logics can define regular languages,providing another mechanism through which we can under-stand regular languages. However, the formalised diagram-matic logics are first-order, so they cannot define non-star-free regular languages. The primary contributions of thispaper are: (a) to develop and formalise a second-order di-agrammatic logic, extending spider diagrams of order, and(b) to establish a class of regular languages that this logiccan define. This lays the essential foundations for provid-ing an exact classification of the regular languages that aredefinable using this new second-order logic.

1. Introduction

There has been significant study of regular languages in

computer science and much is known about how they re-

late to finite automata, symbolic logic, and algebraic for-

malisms. Each of these relationships has led to an increased

understanding of regular languages, justifying that the study

of formal language theory from different perspectives can

be both insightful and highly significant. For instance, work

by Buchi [1], amongst others, provides a logical characteri-

zation of regular languages in terms of symbolic logic; star-

free regular languages are first-order definable and those

which are not star-free are second-order definable. We note

that connections between regular languages and symbolic

logic have been well-studied [8, 20].

Recent work has established how one can define regu-

lar languages using the spider diagram logic [4], which we

elucidate in the next section. Here, we note that spider dia-

grams can only define so-called commutative star-free regu-

lar languages (a commutative language is closed under per-

mutation). This led to the development of spider diagrams

of order [5] which augment spider diagrams with a prod-

uct operator, �, allowing ordering information to be speci-

fied. However, since both of these logics are first-order, they

can only define star-free regular languages. All regular lan-

guages are definable in monadic second-order logic [1, 20].

Whilst the range of diagrammatic logics available is

increasing, those which are formalised are typically lim-

ited in expressiveness to at most that of first-order logic.

Further examples include existential graphs [3], Euler di-

agrams [16], Venn-II [14], Euler/Venn diagrams [18], and

constraint diagrams [11]. The limitation of these logics to

being first-order precludes the formalisation of many com-

monly occurring concepts in both mathematics and com-

puter science, such as defining the property of being finite.

We propose second-order spider diagrams, extending

spider diagrams of order by incorporating arrows (from con-

straint diagrams [11]) and the ability to quantify over sets.

Thus, the development of this logic pushes the boundaries

of what can be expressed by formal diagrammatic nota-

tions in visually intuitive ways. Until this paper, there has

been no attempt to develop a diagrammatic logic that is

capable of defining non-star-free regular languages. In §2

we describe approaches to defining regular languages using

monadic first order logic, spider diagrams, and finite state

machines. The limitations of using spider diagrams are dis-

cussed, which motivates §3, where we introduce a notation

for second-order spider diagrams. In §4 we define the lan-

guage of a diagram and in § 5 we establish that any regu-

lar language with at most star-height 1 can be defined by a

second-order spider diagram.

2. Defining Regular Languages Using Logic

Recall a (not necessarily regular) language over an al-

phabet Σ is a subset of Σ∗ (all the words formed from letters

in Σ). A language is regular if it can be defined by a reg-

ular expression [12]. Regular expressions are formed using

letters in Σ, λ to denote the empty word, and ∅ to denote

the empty language. They also make use of disjunction, de-

noted |, concatenation, denoted ·, and the Kleene star, ∗, to

form complex expressions. In addition, generalised regu-

2010 IEEE Symposium on Visual Languages and Human-Centric Computing

978-0-7695-4206-5/10 $26.00 © 2010 IEEE

DOI

159

2010 IEEE Symposium on Visual Languages and Human-Centric Computing

978-0-7695-4206-5/10 $26.00 © 2010 IEEE

DOI 10.1109/VLHCC.2010.30

159

Page 2: [IEEE 2010 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) - Leganes, Madrid, Spain (2010.09.21-2010.09.25)] 2010 IEEE Symposium on Visual Languages and Human-Centric

lar expressions make use of ¯ to denote complements. A

language is star-free if it can be defined by a generalised

regular expression not involving ∗.

We now summarise key results by Thomas [19], which

establish a strong relationship between regular languages

and Monadic First-Order Logic of Order (MFOL[<]), in

which the only binary predicate is <. To illustrate some

of the key ideas of Thomas’ work and their extension

to the diagrammatic notations of relevance to this paper,

we take an example alphabet to be Σ = {a, b, c, d} and

we assign the predicate symbol Q1 to the set {a, b} and

the predicate symbol Q2 to the set {b, c}. The sentence

∀xQ1(x) therefore defines the language consisting of all

words that contain only as or bs (such as abbba and bbb)

and ∀x(Q1(x) ∧ ¬Q2(x)) defines the language consisting

of all words containing only as.

More precisely, to define the words in a language of a

sentence, S, we are required to consider the notion of sat-

isfaction in a structure, (U,Ψ, <) where U is the universal

set, Ψ interprets the predicate symbols as subsets of U , and

< is strict, total order on U . To illustrate, w = ab is satis-

fied by U = {1, 2}, Ψ(Q1) = {1, 2}, Ψ(Q2) = {2} and

< is the natural order on U . However w = ab is not satis-

fied by the structure with U = {1, 2, 3}, Ψ(Q1) = {1, 2},

Ψ(Q2) = {2} and with < interpreted as the natural order

on U since the length of w is 2 but |U | = 3: the number of

letters in w must equal the cardinality of U . The words in

the language of S are precisely those which are satisfied by

the models of S [19]. We return to this notion in section 4.

1Q

2Q

a b c

d

c,d

a

b

c,d

b,c,d

a

b

c,d

b,c,d

a

b

a,b,c,da

d1

Figure 1. Defining star-free languages.

For spider diagrams [10], there are directly analogous

concepts of structures (called interpretations) and of satis-

faction. Thus, Thomas’ definition of a language definable

by a sentence immediately extends to the notion of a lan-

guage definable by a spider diagram. In figure 1, d1 is a spi-

der diagram that tells us that there are at least two elements

in Q1 − Q2 and at least one element in Q1 ∩ Q2. The low-

ercase letters indicate the mapping defined above, assigning

sets of letters to Q1 and Q2. In terms of languages, this spi-

der diagram defines the language containing precisely the

words that contain at least two as and at least one b. A fi-

nite state machine that accepts the same language is shown

on the right. Since this language is star-free, it may also be

defined by a star-free generalised regular expression such as

(∅a∅a∅b∅)|(∅a∅b∅a∅)|(∅b∅a∅a∅)or a monadic first-order logic sentence, such as

∃x∃y∃z(Q1(x)∧¬Q2(x)∧Q1(y)∧¬Q2(y)∧Q1(z)∧Q2(z)∧x �= y).

In the generalised regular expression just given, ∅ denotes

the complement of the empty language, in other words Σ∗.

Thus, the first disjunct, namely (∅a∅a∅b∅), defines the lan-

guage containing all words where aab is a subword.Spider diagrams are unable to provide any ordering in-

formation on the letters in words, being equivalent in ex-pressive power to monadic first-order logic with equal-ity [17]. Consequently, the language defined by a spiderdiagram is closed under permutation. This observation mo-tivated the development of spider diagrams of order, whichcome equipped with syntax for specifying ordering infor-mation [5]. To illustrate, figure 2 shows a spider diagram oforder that asserts that an a must occur before a b. This order-ing information is achieved by use of the product operator,�; this language may also be defined by a star-free gen-

eralised regular expression ∅a∅b∅ or a monadic first-orderlogic sentence, such as

∃x∃y(Q1(x) ∧ ¬Q2(x) ∧ Q1(y) ∧ Q2(y) ∧ x < y).

Spider diagrams of order are capable of defining precisely

the star-free regular languages [7].

1Q

2Q

a b c

d

d2

1Q

2Q

a b c

d

d2

Figure 2. A spider diagram of order.

The different characteristics of the syntax of these four

approaches to defining regular languages (diagrams, finite

automata, MFOL[<], and regular expressions) imply that

the study of each can provide unique insight into proper-

ties of the others. This has already been seen with Thomas’

work, where he proves that the level at which a star-free

language L first appears in the dot-depth hierarchy [2] is the

same as the minimum number of blocks of alternating quan-

tifiers in an MFOL[<] sentence, in prenex normal form,

which defines L. It is currently unknown how to determine

either the minimum number of quantifier blocks or the first

level in the hierarchy, for an arbitrary language L. In spi-

der diagrams of order, there is no direct analogy to prenex

normal form. The types of normal forms that arise in dia-

grammatic logics are also not analogous to those that arise

in symbolic logics and, therefore, diagrams provide a new

way of considering the level of a language in the dot-depth

hierarchy. Indeed, a characterisation of the commutative

(closed under permutation) star-free regular languages has

recently been derived by considering how spider diagrams

define such languages [6].

160160

Page 3: [IEEE 2010 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) - Leganes, Madrid, Spain (2010.09.21-2010.09.25)] 2010 IEEE Symposium on Visual Languages and Human-Centric

3. Second-Order Spider Diagrams

Second-order spider diagrams extend spider diagrams of

order [5], by adding various pieces of syntax. Here, we

briefly outline the usability considerations that were given

when designing second-order spider diagrams. Similar con-

siderations for syntax design have been discussed for con-

straint diagrams [15].

First, we aim to provide the ability to quantify over sub-

set of U , the universal set. In spider diagrams of order,

particular subsets of U are represented by labelled curves,

with the label identifying the represented set; this aspect

of the notation is inherited from the underlying Euler dia-

gram. Thus, it seems sensible to use the same type of syn-

tax for representing arbitrary (not particular) subsets of U .

Hence, we use unlabelled curves to represent the existence

of a subset of U . For example, in figure 3, the unlabelled

curve asserts the existence of a subset of A. The placement

of an unlabelled curve yields constraints on the set whose

existence is asserted.

There is good reason for using curves (labelled or unla-

belled) to represent sets. They give rise to free-rides [13],

which means that the diagram conveys information that

would need to be inferred in the symbolic case. In addition,

the use of closed curves in this manner can be perceived as

being well-matched to their intended semantics [9]: the en-

closure of one curve, c1, by another, c2, corresponds to the

assertion that the set represented by c1, is a subset of the set

represented by c2.

Figure 3. A second-order unitary diagram.

Further free rides occur through the use of spiders, which

represent the existence of elements, placed within curves.

For example, in figure 4, the placement of the spider inside

A and the disjointness of A from both B and the unlabelled

curve, c, tells us, for free, that the element represented by

the spider is not in B or the set whose existence is asserted

by c. We also include two special spiders, called min and

max, to denote the minimal and maximal elements of U ,

respectively. This is merely to give easy access to these spe-

cial elements and is consistent with Thomas’ work outlined

in the previous section.

Since we want to place constraints on the ordering of

elements, it seems natural to make use of an arrow for this

purpose; arrows were used to denote properties of binary

relations in the case of constraint diagrams [11]. An arrow

Figure 4. Free-rides arising from spiders.

represents a directed relationship, which is exactly what we

have with an order relation. We take an arrow to assert that

the successors of the elements of the set represented by the

arrow’s source form the set of elements represented by the

arrow’s target; if the source or target is a spider, then that

spider is treated as a singleton set. In figure 4, the spider,

s1, inside A is the source of an arrow, a1, targeting a spider,

s2 not in A, B, or the unlabelled curve c. In turn, s2 is

also the source of an arrow, a2, whose target spider is not

in A, B, or c. For free, we see that the successor of the

successor of the element represented by s1 is not in A, B, or

c, by ‘navigating’ along the arrows. The arrow sourced on

B asserts that every element which is a successor of some

element in B is in neither A nor B, since the target c is

disjoint from both A and B.

Returning to figure 3, the diagram includes an unlabelled

curve to denote the existence of a set, E, in this case a subset

of A−B, and the arrow asserts that the image of the succes-

sor function with its domain restricted to E is the element,

e, denoted by the spider in B − A. Although there are no

spiders explicitly represented in E, we know it must, there-

fore, represent a non-empty set and moreover be a singleton

set: the elements of this set have, between them, a single

successor. The sets represented by the source and target of

a successor arrow have the same cardinality.

To summarize, our additional syntax includes unlabelled

curves to represent existential quantification over subsets,

the use of arrows to represent properties of a successor

function, and constants min and max to represent the first

and last elements of the order relation <. We now proceed

to formally define second-order spider diagrams, extending

[5].

We will start by formalising the abstract syntax of uni-tary diagrams which are combined in various ways to create

compound diagrams. We have a finite set of fixed contours,

FC, and a countable set of existential contours, EC. Each

contour in FC corresponds to a labelled curve in a diagram,

whereas the contours in EC correspond to unlabelled curves.

A zone is a pair of finite, disjoint sets of contours. Given a

diagram, d, with a finite set of contours C ⊂ FC ∪ EC, one

defines the zones of d as pairs of sets of contours (in, out),such that in ∪ out = C. Intuitively, the zone thus defined is

inside every contour of in, and outside every contour of out.

161161

Page 4: [IEEE 2010 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) - Leganes, Madrid, Spain (2010.09.21-2010.09.25)] 2010 IEEE Symposium on Visual Languages and Human-Centric

The set of all zones is Z , and the set of zones of a diagram is

denoted Z. Zones may be shaded: the set of shaded zones

of a diagram is denoted Sh, where Sh ⊆ Z.

An existential spider consists of a natural number and

a set of zones. The natural number gives the spider a la-

bel, and the set of zones describes the habitat of the spi-

der. For each z in the habitat, the spider has a foot in that

zone. The set of all existential spiders is denoted ES. Fur-

thermore, we have a set of constant spiders, denoted CS,

consisting of min and max. The sets FC, EC, ES and CSare pairwise disjoint. Properties of the successor function

will be denoted by arrows, formally defined as a set of

pairs, (source, target). The source and target for each pair

is drawn from FC∪EC∪ES∪CS. The following definition

builds on that for spider diagrams of order from [5]:

Definition 1. A unitary second-order spider diagram is atuple:

d = (C, Z, Sh, S, η, SucA)

which satisfies the following:

1. C = C(d) ⊂ FC ∪ EC is a finite set of contours.2. Z = Z(d) is a finite set of zones, such that for each

zone (in, out) ∈ Z(d), in ∪ out = C(d).3. Sh = Sh(d) ⊆ Z(d) is a set of shaded zones.4. S = S(d) ⊆ ES ∪ CS is a finite set of spiders.5. The function ηd : S(d) → P(Z(d)) returns the habitat

of every spider.6. SucA = SucA(d) is a finite set of successor arrows

such that for each (s, t) ∈ SucA(d), s and t are eacheither a contour in C(d) or a spider in S(d).

In figure 3, there are two fixed contours, A and B, and

one existential contour, unlabelled in the diagram but given

the name E in the abstract syntax. There are 5 zones in

this diagram, such as ({A}, {E, B}) and one shaded zone:

({A,B}, {E}). The spider has a single foot and is de-

fined by (1, {({B}, {E, A})}). The set SucA(d) contains

one element, namely (E, (1, {({B}, {E, A})})). We extend

unitary second-order spider diagrams to compound second-order spider diagrams:

Definition 2. If d1 and d2 are unitary second-order spiderdiagrams then (d1 ∨ d2), (d1 ∧ d2), (d1 � d2) and ¬d1 arecompound second-order spider diagrams.

We have a further concept of a missing zone. Intu-

itively, a zone is missing if such a zone is possible given

the contour set of a diagram, but not present in the zones

of that diagram. The diagram in figure 3 has 3 contours,

and so 8 possible zones. Thus, there are 3 missing zones:

({E}, {A,B}), ({B, E}, {A}) and ({A,B, E}, ∅).Definition 3. Given a unitary second-order diagram d, azone (in, out) is missing from d if it is in the set {(in, C(d)−in) : in ⊆ C(d)} − Z(d), denoted MZ(d).

The semantics extend those in [5].

Definition 4. An interpretation is a 4-tuple, (U,Ψ, <, Suc),where U is some finite set, Ψ : FC∪CS → PU is a functionwhich maps fixed contours and the constant spiders (treatedas singleton sets) to subsets of U , < is a strict total orderon U and Suc is a successor function respecting <. Theconstant spiders with labels min and max are interpretedas the minimal and maximal elements of U under <. When|U | = 1, we define Ψ(min) = Ψ(max) = U and <= ∅.

In the above definition, Suc is a partial function since the

maximal element of < has no successor; all other elements

have successors. Given an interpretation, we wish to know

when it agrees with the intended meaning of the diagram; an

‘agreeing’ interpretation is called a model. As an example,

consider again figure 3. A model for this diagram is U ={1, . . . , 5}, with < being the natural order on U , Suc being

the natural successor function on U , and Ψ(A) = {1, 2}and Ψ(B) = {3}. In order to identify this interpretation

as a model, we need to interpret the existentially quantified

elements of the diagram, namely the existential contour E

and the spider s. To do so, we extend Ψ to Ψ′, mapping

these elements in an appropriate manner to subsets of U :

Ψ′(E) = {2} and Ψ′(s) = {3}.

Definition 5. Let I = (U,Ψ, <, Suc) be an interpretation,and let EC be a set of existential contours, and let S be a setof spiders. An extended interpretation J = (U, Ψ′, <, Suc)where Ψ′ : FC ∪ CS ∪ EC ∪ S → PU and:

1. for each c ∈ FC, Ψ′(c) = Ψ(c),2. for each zone (in, out) ∈ P(FC ∪ EC)× P(FC ∪ EC),

we define:

Ψ′(in, out) =⋂

c∈in

Ψ′(c) ∩⋂

c∈out

(U − Ψ′(c))

3. for each set of zones Z, we define:

Ψ′(Z) =⋃

z∈Z

Ψ′(z)

4. for each spider s ∈ S, we have |Ψ′(s)| = 1

We say that J is an extension of I .

We can then give conditions as to when an interpretation,

I , is a model for a diagram d, by extending I to interpret the

spiders and existential contours of d, denoted EC(d).

Definition 6. Let d be a unitary second-order diagram andlet I = (U,Ψ, <, Suc) be an interpretation. If there existsan extension extension J = (U,Ψ′, <, Suc) of I to EC(d)∪(S(d) ∩ ES) where the following conditions hold, then I isa model for d:

162162

Page 5: [IEEE 2010 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) - Leganes, Madrid, Spain (2010.09.21-2010.09.25)] 2010 IEEE Symposium on Visual Languages and Human-Centric

1. The missing zones condition. The missing zones rep-resent the empty set, i.e.

⋃z∈MZ(d) Ψ′(d) = ∅.

2. The spiders’ distinctness condition. ∀s1, s2 ∈S(d).Ψ′(s1) = Ψ′(s2) ⇒ s1 = s2.

3. The shaded zone condition. All elements in the setsrepresented by shaded zones are represented by spi-ders: ∀z ∈ Sh(d). Ψ′(z) ⊆ ⋃

s∈S(d) Ψ′(s).

4. The spiders’ location condition. The elements rep-resented by the spiders are in the sets represented bytheir habitats: ∀s ∈ S(d).Ψ′(s) ⊆ Ψ′(ηd(s)).

5. The successor condition. Successor arrows indicatea bijection between the sets represented by the sourceand target, that is ∀(s, t) ∈ SucA(d). Suc|Ψ′(s) is abijection with image Ψ′(t).

If J makes the above conditions hold then J is a valid ex-tension of I .

We now extend the concept of a model to compound dia-

grams. In the case of the connectives ∧,∨ and ¬, the exten-

sion is obvious. However, the product operator � is more

subtle. We extend the following definition from [8]:

Definition 7. Let I1 = (U1, Ψ1, <1, Suc1) and I2 =(U2, Ψ2, <2, Suc2) be interpretations where U1 and U2 aredisjoint, and let mini, maxi be the minimum and maximumelements, respectively of Ui for i = 1, 2. The ordered sumof I1 and I2, denoted I1 + I2, is defined to be the interpre-tation I3 = (U3, Ψ3, <3, Suc3) such that:

1. U3 = U1 ∪ U2,

2. for each c ∈ FC, Ψ3(c) = Ψ1(c) ∪ Ψ2(c),

3. <3=<1 ∪ <2 ∪{(u1, u2) : u1 ∈ U1 ∧ u2 ∈ U2},

Definition 8. Let I = (U, Ψ, <, Suc) be an interpretationand let d be a compound diagram. Then I is a model for dprovided:

1. if d = d1 ∨ d2 then I models d whenever I models d1

or I models d2,

2. if d = d1 ∧ d2 then I models d whenever I models d1

and I models d2,

3. if d = ¬d1 then I models d whenever I does not modeld1, and

4. if d = d1 � d2 then I models d whenever there existinterpretations I1 and I2 such that I = I1 + I2 and I1

models d1 and I2 models d2.

Second-order spider diagrams are a direct extension of

spider diagrams of order as in [7], augmenting them with

constant spiders, min and max, arrows to talk about suc-

cessors, and existential contours. Therefore:

Theorem 1. Second-order spider diagrams are at least asexpressive as spider diagrams of order.

Previous work has shown that spider diagrams of order

are equivalent in expressive power to MFOL[<] [7]. Thus:

Corollary 2. Second-order spider diagrams are at least asexpressive as MFOL[<].

4. The Language of a Diagram

Any language that is definable by a spider diagram of or-

der will be definable by a second-order spider diagram. It is

known that the language (aa)+ is not first-order definable.

However, it is second-order definable. Consider d1 in fig-

ure 5. Given an alphabet Σ = {a, b}, we assign a to the

given contour A. Elements in the existential subset contain-

ing the constant spider max (call this subset Amax) are suc-

cessors of elements of the existential subset containing the

constant spider min (call this subset Amin). Furthermore,

the shading elsewhere in the diagram shows that, elements

in Amax can only have successors in Amin. Since we have a

bijection between the disjoint subsets Amin and Amax, and

Ψ(Amin) ∪ Ψ(Amax) = U , we deduce U has even cardi-

nality. Thus, since both are assigned the letter a, any word

in the language defined by d1 must consist of an even num-

ber of as; in other words, the language defined is (aa)+.

Omitting min and max from the diagram would change the

defined language to (aa)∗.

Figure 5. The languages (aa)+ and (aba)+.

Similar reasoning gives us that d2 in figure 5 is a dia-

gram of the language (aba)+. Further, the universal set of

any model of this diagram must have cardinality which is

divisible by 3. Again, omitting min and max would mean

that the language defined is (aba)∗.

We now formalise the notion of a language of a diagram.

To do so, we assume that the sets FC and Σ are fixed and,

moreover, that the function l : FC → PΣ is given; we call

l a letter assignment. We extend l to interpret zones, such

that for each zone (in, out)

l(in, out) =⋂

c∈in∩FCl(c)

c∈out∩FC(Σ − l(c)).

Using l, we are able to associate letters with zones. First,

we define for each letter a ∈ Σ, the set of fixed contours

163163

Page 6: [IEEE 2010 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) - Leganes, Madrid, Spain (2010.09.21-2010.09.25)] 2010 IEEE Symposium on Visual Languages and Human-Centric

(fc) that ‘contain’ that letter:

fc(a) = {c ∈ FC : a ∈ l(c)}.We can think of the zone (fc(a),FC − fc(a)) as containing

a. So, fc is a function with domain Σ and codomain PFC.

For instance, if we take the mapping l(A) = {a, b} and

l(B) = {b, c} over the alphabet Σ = {a, b, c} and with

FC = {A,B} then the letter a has fc(a) = {A} and is

contained by the zone ({A}, {B}). Thus, a unitary diagram

containing the zone ({A}, {B}) with a spider placed in that

zone would assert the existence of a letter a in each word

of the language it defines. A unitary diagram containing

the zone ({A}, ∅) with a spider placed in that zone would

assert the existence of a letter a or a letter b in each word of

the language it defines. Such a diagram cannot distinguish

the letters a and b, unlike a diagram containing both the

fixed contours A and B. It is important that we are able to

distinguish each letter using some diagram: if two letters, aand b say, have the same image under fc then no diagram

can define the language a∗ for example. We require that

no two letters are placed in any zone that arises from the

function fc.

Definition 9. Given FC, Σ and a letter assignment l, wesay that l is valid if the induced function fc is injective.

From this point forward, we are assuming that a valid

letter assignment, l, has been specified. We are now able to

identify word models, extending and adapting [19]. That is,

given an interpretation, we can identify whether it models

a word, as described in section 2. This notion can then be

extended to languages.

Definition 10. Let I = (U,Ψ, <, Suc) be an interpretationand let w = w1..., wn be a word drawn from Σ∗. Then I is amodel for w iff there exists a bijection f , from the multi-set{w1, ..., wn} to U such that for each wi:

1. f(wi) is the ith element of the total ordering inducedby <

2. f(wi) ∈ Ψ(fc(wi),FC − fc(wi)).

Furthermore, I is a model for language L iff there exists aword, w, in L such that I models w.

Definition 11. Let d be a second-order spider diagram andlet L be a language. We say that L is the language of d iffthe models of d are the same as the models of L. If L is thelanguage of d then we say that d defines L.

Theorem 3. Second-order spider diagrams can define (i)all star-free regular languages, and (ii) regular languageswhich are not star-free.

Proof. For part (i) it was shown that spider diagrams of or-

der can define precisely the star-free regular languages [7].

By theorem 1, it follows that second-order spider diagrams

can also define all star-free regular languages. For part

(ii), the spider diagram d1 in figure 5 defines the language

(aa)+, which is regular but not star-free.

Corollary 4. Second-order spider diagrams are strictlymore expressive than spider diagrams of order andMFOL[<].

5. Defining Languages of Star-Height 1

We will now establish that second-order spider diagrams

can define all star-height 1 languages: a language is of star-

height one if it can be defined by the concatenation or dis-

junction of regular expressions, r1, ..., rn, that each use at

most one star. To begin, we build on the observations made

in earlier examples based on figure 5. These diagrams de-

fined the languages (aa)+ and (aba)+. With this type of

construction, we may define any language of the form w+

where w ∈ Σ∗. To define w∗, we simply omit min and maxfrom the diagram.

We can construct a diagram for w∗ using a process we

now describe. Start by drawing a diagram containing all

contours in FC with no missing zones. To this diagram,

we add existential contours, one for each letter, wi, in

w = w1 . . . wn placed in the zone (fc(wi),FC − fc(wi)),ensuring that any pair of existential contours do not have

a common zone inside them (i.e. they have pairwise dis-

joint interiors). The existential contour, Ci, arising from wi

is then joined by a successor arrow to the contour, Ci+1,

arising from wi+1 for each i < n. Finally, shade all zones

that are not inside some existential contour. The diagram dproduced in this manner is said to be constructed for w∗.

Figure 6. Constructing a diagram for (abcae)∗.

To illustrate, given FC = {A,B,C}, Σ = {a, b, c, d, e}with l(A) = {a, b, d}, l(B) = {b, c, d} and l(C) = {d, e},

164164

Page 7: [IEEE 2010 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) - Leganes, Madrid, Spain (2010.09.21-2010.09.25)] 2010 IEEE Symposium on Visual Languages and Human-Centric

we can construct d that defines w∗ where w = abcae. The

process can be seen in figure 6. First, we draw a diagram

with the three contours A, B, and C and no missing zones,

shown as d1. Next, for each of the letters in w, we draw an

existential contour inside the appropriate zone of d1, giv-

ing d2. In d3, we have joined the existential contours with

arrows and then added shading to all zones outside of the

existential contours to give the required diagram d4. The

shading forces all elements to be in the sets represented by

existential contours. Moreover, only one such contour, C1

say, is not the target of an arrow. This means that, in any

non-empty model for d, the minimal element must be in

Ψ′(C1). In terms of the language of d, this implies that

each word must start with an a, since C1 is placed in the

zone assigned the letter a. This occurrence of a must be

followed by a b, since the successor arrow sourced on C1

targets an existential contour that is placed in the zone that

is assigned the letter b.

Likewise, the existential contour, C5 say, that is not the

source of an arrow must contain the maximal element, so

the words in the language end with e. It should now be

clear that any word in the language starts with an a, which

is followed by a b, then c, then a and lastly an e: abcae.

Once we reach e, we can again read an a, arising from C1.

This is because, although there is no explicit successor ar-

row sourced on C5, the only existential contour that is not

the target of a successor arrow is C1. It follows that all of

the successors of elements in C5 must be in C1. The pattern

repeats and we see that the words in the language arising

from non-empty models are: abcae, abcaeabcae, and so

forth. Thus, it should be clear that the language of d in-

cludes all words in w+. The only word in w∗ that is not

in w+ is λ, which arises from the empty model. Thus, the

diagram d constructed for w∗ defines w∗.

Definition 12. Let d be a unitary second-order diagram andlet C = (C1 . . . Cn) be a sequence of contours of d. Then Cis a connected sequence if there is an arrow (Ci, Ci+1) inSucA(d) for each i < n.

Theorem 5 (Connected Sequences). Let d be a diagramwith a connected sequence of contours C = (C1, ..., Cn). Ifeach zone, (in, out), in d that is not inside one of the Cis(i.e no Ci is in the set in) is shaded and does not contain aspider foot then any model I = (U,Ψ, <, Suc) for d ensures

1.⋃

i=1,...,n Ψ′(Ci) = U ,

2. Ψ′(Ci) ∩ Ψ′(Cj) = ∅, for each i < j ≤ n.

3. Ψ′(min) ∈ Ψ′(C1), and

4. Ψ′(max) ∈ Ψ′(Cn)

where (U,Ψ′, <, Suc) is a valid extension of I .

This theorem allows us to prove the following result:

Theorem 6. Let w be a word and let d be the diagram con-structed for w∗. Then d defines the language w∗.

In general, regular expressions can be more complex

than just formed from a word in Σ∗. We now present a

method for constructing a diagram given r∗ where r is a

star-free (not generalised) regular expression. As a simple

example, consider the regular expression r = (b|d) · (c|e)and suppose we wish to construct a diagram for r∗ using

the same fixed contour set, alphabet and letter assignment

as in the previous example. The process is similar to that

for w∗, but we must now take into account disjunction. A

construction is shown in figure 7. The diagram d2 draws an

existential contour for each letter in r, grouping those for

b and d and those for c and e using further existential con-

tours, EC and EC ′ respectively. Moving on to d3, we join

EC and EC ′ with an arrow, telling us that after we ‘read’ a

b or a d we can read a c or an e. Adding the shading to cre-

ate d4 tells us that c and e are the only successors of b and d.

A simpler diagram, d5, also defines the language r∗ in this

case. However, the process illustrated in figure 7 forms the

basis of a general construction method that we now outline.

Figure 7. Defining ((b|d) · (c|e))∗.

Let r be any regular expression constructed from Σ with-

out using the Kleene star. For simplicity, we describe a

construction when r does not involve λ or ∅. Then, r will

be of the form e1 · e2 · e3 · . . . · en where ei are regular

expressions. We start the construction of the diagram by

searching through r until we reach expressions of the form

(l1|l2| . . . |ln) · (l′1|l′2| . . . |l′m), where li, l′j ∈ Σ. For each

such expression, we create the following fragment of a di-

agram: for l1, . . . , ln, create a set of n existential contours,

call them C1, . . . , Cn, which are pairwise disjoint. We as-

sign the letter li to existential contour Ci by drawing Ci

inside a zone, z, where l(z) = {li}; Ci is the start contourand end contour of li. Create a further contour C such that,

for each i < n, Ci is drawn inside C. Shade the region in-

side C but outside C1, . . . , Cn. Next, create a set of m pair-

165165

Page 8: [IEEE 2010 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) - Leganes, Madrid, Spain (2010.09.21-2010.09.25)] 2010 IEEE Symposium on Visual Languages and Human-Centric

wise disjoint contours, call them C ′1, . . . , C

′m, correspond-

ing to the letters l′1, . . . , l′m. Furthermore, these contours

should be disjoint from every contour C1, . . . , Cn, C. Cre-

ate another contour C ′ in the same manner as C, but it will

contain C ′1, . . . , C

′m. Shade the equivalent region thus cre-

ated. Finally, add an arrow from C to C ′. This should create

a fragment which looks like d1 in figure 8 (in our example

of figure 7 this gives d2). The contour C is the start contourand C ′ is the end contour of (l1|l2| . . . |ln) · (l′1|l′2| . . . |l′m).

Figure 8. The initial construction and the in-ductive construction.

Given constructions for f1, . . . , fn and g1, . . . , gm, to

create a construction for (f1| . . . |fn) · (g1| . . . |gm) we per-

form the following steps. First, add a contour, C, that con-

tains precisely the start contours of each fi then shade the

region inside C but outside each fi; C is the start contourfor (f1| . . . |fn) · (g1| . . . |gm). Similarly, create an end con-tour, C ′, for (f1| . . . |fn) · (g1| . . . |gm), enclosing precisely

the end contours of each gi. Next, add a contour F such that

for every end-contour Ce in f1, . . . , fn, we have Ce is in-

side F , shading in the same manner. Now add a contour Gsuch that for every start-contour Cs in g1, . . . , gm we have

Cs is inside G, again shading the region inside G but out-

side the other contours. Finally, add an arrow (F, G). We

see this step demonstrated in d2 in figure 8. Once we have a

construction for r, we shade all zones that are not inside any

existential contour and we say that the resulting diagram, d,

is constructed for r∗.

In our construction, we have existential contours that are

not the target of an arrow and, yet, they are identified as

successors of some other elements, since they are inside

a contour that is the target of an arrow. We say a con-

tour C in a diagram is an implicit target (resp. implicitsource) of an arrow (s, t) ∈ SucA(d) iff C is drawn in-

side t (C is drawn inside s). We note that the only con-

tour which is not a source or implicit source is the end-

contour of r. Similarly, the only contour which is not a

target or implicit target is the start contour of r. Therefore,

by shading elsewhere, the only place it is possible to go, us-

ing successor, from the end contour of r is the start contour

of r. It is the same situation as in figure 5. As a further

example, the diagram in figure 9 shows a construction for

(l1 · ((l1 · l2)|(l3 · (l4|l5)) · (l1|l2)) · l3)∗, where we have

omitted the underlying fixed contours and merely shown the

letter assignments to each existential contour; these labels

are not part of the diagram.

Figure 9. Illustrating the construction.

We have seen how to create a unitary diagram for a regu-lar expression, r∗, where r contains no star. The disjunctionand product of such diagrams can create representations oflanguages such as (r∗1 |r∗2) · r3 where each ri does not usethe Kleene star: let d1 be a diagram representing the lan-guage r∗1 , let d2 be a diagram representing the languager∗2 and let d3 be a diagram representing the language r3.The compound diagram (d1∨d2)�d3 defines the language(r∗1 |r∗2) · r3. The constructions do not give particularly el-egant diagrams and they may contain a lot of redundancy.For instance, d1 in figure 1 defines the language specifiedby the regular (not generalised) expression

((a|b|c|d)∗ · a · (a|b|c|d)∗ · a · (a|b|c|d)∗ · b · (a|b|c|d)∗)|((a|b|c|d)∗ · a · (a|b|c|d)∗ · b · (a|b|c|d)∗ · a · (a|b|c|d)∗)|((a|b|c|d)∗ · b · (a|b|c|d)∗ · a · (a|b|c|d)∗ · a · (a|b|c|d)∗)

which, when following our construction, would give rise to

a compound diagram consisting of 21 unitary parts. How-

ever, the construction allows us to prove:

Theorem 7. Any regular expression of star-height 1 can bedefined by a second-order spider diagram.

6. Conclusion

In this paper, we have introduced second-order spider di-

agrams, which include the ability to quantify over subsets

of the universal set, U . The notation has been formalised

and proved to be properly second-order in terms of its ex-

pressiveness. For instance, first-order logics cannot define

(aa)∗, which we can define using second-order spider di-

agrams. To the best of our knowledge, these are the first

formal extensions of Euler diagrams which are proved to be

second-order.

Our primary motivation for developing this logic is to

provide another mechanism through which we can study

regular languages. In order to facilitate such studies, we de-

fined the language defined by a diagram. Further, we have

begun to explore the class of regular languages definable by

second-order spider diagrams. In particular, we established

that all regular languages of star-height one are definable.

166166

Page 9: [IEEE 2010 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) - Leganes, Madrid, Spain (2010.09.21-2010.09.25)] 2010 IEEE Symposium on Visual Languages and Human-Centric

We fully expect to be able to extend the construction given

in section 5 to regular expressions of star-height two and

beyond. Indeed, we conjecture that second-order spider di-

agrams can define all regular languages. In addition, spi-

der diagrams could be used to create regular expressions.

At present, we have not provided a mechanism to translate

arbitrary spider diagrams into regular expressions and this

remains the subject of future work. We also plan to use

second-order spider diagrams to study regular languages.

Differences between the manner in which regular expres-

sions and second-order spider diagrams define regular lan-

guages could well mean that these diagrams provide us with

new insight into the properties of regular languages.

Further motivation for this paper was derived from pro-

viding more understanding of how we can visually ex-

presses complex concepts using diagrams. The provision of

second-order spider diagrams has yielded an increase in ex-

pressiveness, taking us beyond the limits of first-order log-

ics. Still, there is much work to be done here. The diagram-

matic logics that exist to-date cannot axiomatise many com-

monly occurring concepts, such as the mathematical prop-

erties of a set having even cardinality or being dense, or

defining the transitive closure of a relation. It is not possi-

ble, yet, to make statements that quantify over arbitrary rela-

tions or functions defined over the universal set. For exam-

ple, statements like ∃P ∀x∀yP (x, y), where P is a second-

order two-place predicate variable, cannot yet be formally

defined using diagrams of this kind.

Extending second-order spider diagrams to allow quan-

tification over binary relations will permit a greater range

of second-order properties to be defined visually, consider-

ably increasing their expressiveness. We are exploring this

extension, further pushing the boundaries of what can be

expressed using diagrammatic logics. Indeed, it would be

a very considerable advance over the current state-of-the-

art if a diagrammatic logic with the full expressiveness of

second-order logic was developed.

Acknowledgement This research is supported by EPSRC

grants for the Defining Regular Languages with Diagrams

project [EP/H012311/1] and the Sketching Euler Diagrams

project [EP/H048480/1]. We thank Aidan Delaney, John

Taylor and the anonymous reviewers for helpful comments

on this paper.

References

[1] J. Buchi. Weak second order arithmetic and finite automata.

Zeitschrift fur mathematische Logik und Grundlagen derMathematik, 6:66–92, 1960.

[2] R. Cohen and J. Brzozowski. Dot-depth of star-free events.

Journal of Computer and System Science, 5:1–16, 1971.[3] F. Dau and P. Ekland. A diagrammatic reasoning system for

the description logic ACL. Journal of Visual Languages andComputing, 19(5):539–573, 2008.

[4] A. Delaney and G. Stapleton. On the descriptional com-

plexity of a diagrammatic notation. In 13th InternationalConference on Distributed Multimedia Systems, Visual Lan-guages and Computing, pages 195–202, 2007.

[5] A. Delaney and G. Stapleton. Spider diagrams of order. In

Visual Languages and Logic, volume 274 of CEUR, pages

27–39, 2007.[6] A. Delaney, G. Stapleton, J. Taylor, and S. Thompson. A

diagrammatic characterisation of commutative star-free reg-

ular languages. in preparation.[7] A. Delaney, J. Taylor, and S. Thompson. Spider diagrams of

order and a hierarchy of star-free regular languages. In 5thInternational Conference on the Theory and Application ofDiagrams, LNCS, pages 172–187. Springer, 2008.

[8] H.-D. Ebbinghaus and J. Flum. Finite Model Theory.

Springer-Verlag, 1991.[9] C. Gurr. Aligning syntax and semantics in formalisations

of visual languages. In Proceedings of IEEE Symposia onHuman-Centric Computing Languages and Environments,

pages 60–61. IEEE Computer Society Press, 2001.[10] J. Howse, G. Stapleton, and J. Taylor. Spider diagrams.

LMS Journal of Computation and Mathematics, 8:145–194,

2005.[11] S. Kent. Constraint diagrams: Visualizing invariants in

object oriented modelling. In Proceedings of OOPSLA97,

pages 327–341. ACM Press, October 1997.[12] A. Mateescu and A. Salomaa. Formal languages: an intro-

duction and a synopsis. In Handbook of formal languages,vol. 1: word, language, grammar, pages 1–39. Springer-

Verlag New York, Inc., New York, NY, USA, 1997.[13] A. Shimojima. Inferential and expressive capacities of

graphical representations: Survey and some generalizations.

In Proceedings of 3rd International Conference on the The-ory and Application of Diagrams, volume 2980 of LNAI,pages 18–21, Cambridge, UK, 2004. Springer.

[14] S.-J. Shin. The Logical Status of Diagrams. Cambridge

University Press, 1994.[15] G. Stapleton and A. Delaney. Evaluating and generalizing

constraint diagrams. Journal of Visual Languages and Com-puting, 19(4):499–521, 2008.

[16] G. Stapleton and J. Masthoff. Incorporating negation into

visual logics: A case study using Euler diagrams. In VisualLanguages and Computing 2007, pages 187–194. Knowl-

edge Systems Institute, 2007.[17] G. Stapleton, S. Thompson, J. Howse, and J. Taylor. The

expressiveness of spider diagrams. Journal of Logic andComputation, 14(6):857–880, December 2004.

[18] N. Swoboda and G. Allwein. Using DAG transformations to

verify Euler/Venn homogeneous and Euler/Venn FOL het-

erogeneous rules of inference. Journal on Software and Sys-tem Modeling, 3(2):136–149, 2004.

[19] W. Thomas. Classifying regular events in symbolic logic.

Journal of Computer and System Sciences, 25:360–376,

1982.[20] B. Trahtenbrot. Finite automata and the logic of monadic

predicates. Sibirskij Mat. Zhurnal, 3:103–131, 1962.

167167