[IEEE 2010 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) - Leganes,...
Transcript of [IEEE 2010 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) - Leganes,...
Introducing Second-Order Spider Diagrams for Defining Regular Languages
Peter Chapman and Gem StapletonVisual Modelling Group, University of Brighton, UK
{p.b.chapman, g.e.stapleton}@brighton.ac.uk
Abstract
There has been significant research effort focussed on thestudy of regular languages, since they play a vital role in ourunderstanding of computation. This existing research drawsa large number of connections with other areas, such asalgebra and symbolic logic. Recently, research has beguninto how diagrammatic logics can define regular languages,providing another mechanism through which we can under-stand regular languages. However, the formalised diagram-matic logics are first-order, so they cannot define non-star-free regular languages. The primary contributions of thispaper are: (a) to develop and formalise a second-order di-agrammatic logic, extending spider diagrams of order, and(b) to establish a class of regular languages that this logiccan define. This lays the essential foundations for provid-ing an exact classification of the regular languages that aredefinable using this new second-order logic.
1. Introduction
There has been significant study of regular languages in
computer science and much is known about how they re-
late to finite automata, symbolic logic, and algebraic for-
malisms. Each of these relationships has led to an increased
understanding of regular languages, justifying that the study
of formal language theory from different perspectives can
be both insightful and highly significant. For instance, work
by Buchi [1], amongst others, provides a logical characteri-
zation of regular languages in terms of symbolic logic; star-
free regular languages are first-order definable and those
which are not star-free are second-order definable. We note
that connections between regular languages and symbolic
logic have been well-studied [8, 20].
Recent work has established how one can define regu-
lar languages using the spider diagram logic [4], which we
elucidate in the next section. Here, we note that spider dia-
grams can only define so-called commutative star-free regu-
lar languages (a commutative language is closed under per-
mutation). This led to the development of spider diagrams
of order [5] which augment spider diagrams with a prod-
uct operator, �, allowing ordering information to be speci-
fied. However, since both of these logics are first-order, they
can only define star-free regular languages. All regular lan-
guages are definable in monadic second-order logic [1, 20].
Whilst the range of diagrammatic logics available is
increasing, those which are formalised are typically lim-
ited in expressiveness to at most that of first-order logic.
Further examples include existential graphs [3], Euler di-
agrams [16], Venn-II [14], Euler/Venn diagrams [18], and
constraint diagrams [11]. The limitation of these logics to
being first-order precludes the formalisation of many com-
monly occurring concepts in both mathematics and com-
puter science, such as defining the property of being finite.
We propose second-order spider diagrams, extending
spider diagrams of order by incorporating arrows (from con-
straint diagrams [11]) and the ability to quantify over sets.
Thus, the development of this logic pushes the boundaries
of what can be expressed by formal diagrammatic nota-
tions in visually intuitive ways. Until this paper, there has
been no attempt to develop a diagrammatic logic that is
capable of defining non-star-free regular languages. In §2
we describe approaches to defining regular languages using
monadic first order logic, spider diagrams, and finite state
machines. The limitations of using spider diagrams are dis-
cussed, which motivates §3, where we introduce a notation
for second-order spider diagrams. In §4 we define the lan-
guage of a diagram and in § 5 we establish that any regu-
lar language with at most star-height 1 can be defined by a
second-order spider diagram.
2. Defining Regular Languages Using Logic
Recall a (not necessarily regular) language over an al-
phabet Σ is a subset of Σ∗ (all the words formed from letters
in Σ). A language is regular if it can be defined by a reg-
ular expression [12]. Regular expressions are formed using
letters in Σ, λ to denote the empty word, and ∅ to denote
the empty language. They also make use of disjunction, de-
noted |, concatenation, denoted ·, and the Kleene star, ∗, to
form complex expressions. In addition, generalised regu-
2010 IEEE Symposium on Visual Languages and Human-Centric Computing
978-0-7695-4206-5/10 $26.00 © 2010 IEEE
DOI
159
2010 IEEE Symposium on Visual Languages and Human-Centric Computing
978-0-7695-4206-5/10 $26.00 © 2010 IEEE
DOI 10.1109/VLHCC.2010.30
159
lar expressions make use of ¯ to denote complements. A
language is star-free if it can be defined by a generalised
regular expression not involving ∗.
We now summarise key results by Thomas [19], which
establish a strong relationship between regular languages
and Monadic First-Order Logic of Order (MFOL[<]), in
which the only binary predicate is <. To illustrate some
of the key ideas of Thomas’ work and their extension
to the diagrammatic notations of relevance to this paper,
we take an example alphabet to be Σ = {a, b, c, d} and
we assign the predicate symbol Q1 to the set {a, b} and
the predicate symbol Q2 to the set {b, c}. The sentence
∀xQ1(x) therefore defines the language consisting of all
words that contain only as or bs (such as abbba and bbb)
and ∀x(Q1(x) ∧ ¬Q2(x)) defines the language consisting
of all words containing only as.
More precisely, to define the words in a language of a
sentence, S, we are required to consider the notion of sat-
isfaction in a structure, (U,Ψ, <) where U is the universal
set, Ψ interprets the predicate symbols as subsets of U , and
< is strict, total order on U . To illustrate, w = ab is satis-
fied by U = {1, 2}, Ψ(Q1) = {1, 2}, Ψ(Q2) = {2} and
< is the natural order on U . However w = ab is not satis-
fied by the structure with U = {1, 2, 3}, Ψ(Q1) = {1, 2},
Ψ(Q2) = {2} and with < interpreted as the natural order
on U since the length of w is 2 but |U | = 3: the number of
letters in w must equal the cardinality of U . The words in
the language of S are precisely those which are satisfied by
the models of S [19]. We return to this notion in section 4.
1Q
2Q
a b c
d
c,d
a
b
c,d
b,c,d
a
b
c,d
b,c,d
a
b
a,b,c,da
d1
Figure 1. Defining star-free languages.
For spider diagrams [10], there are directly analogous
concepts of structures (called interpretations) and of satis-
faction. Thus, Thomas’ definition of a language definable
by a sentence immediately extends to the notion of a lan-
guage definable by a spider diagram. In figure 1, d1 is a spi-
der diagram that tells us that there are at least two elements
in Q1 − Q2 and at least one element in Q1 ∩ Q2. The low-
ercase letters indicate the mapping defined above, assigning
sets of letters to Q1 and Q2. In terms of languages, this spi-
der diagram defines the language containing precisely the
words that contain at least two as and at least one b. A fi-
nite state machine that accepts the same language is shown
on the right. Since this language is star-free, it may also be
defined by a star-free generalised regular expression such as
(∅a∅a∅b∅)|(∅a∅b∅a∅)|(∅b∅a∅a∅)or a monadic first-order logic sentence, such as
∃x∃y∃z(Q1(x)∧¬Q2(x)∧Q1(y)∧¬Q2(y)∧Q1(z)∧Q2(z)∧x �= y).
In the generalised regular expression just given, ∅ denotes
the complement of the empty language, in other words Σ∗.
Thus, the first disjunct, namely (∅a∅a∅b∅), defines the lan-
guage containing all words where aab is a subword.Spider diagrams are unable to provide any ordering in-
formation on the letters in words, being equivalent in ex-pressive power to monadic first-order logic with equal-ity [17]. Consequently, the language defined by a spiderdiagram is closed under permutation. This observation mo-tivated the development of spider diagrams of order, whichcome equipped with syntax for specifying ordering infor-mation [5]. To illustrate, figure 2 shows a spider diagram oforder that asserts that an a must occur before a b. This order-ing information is achieved by use of the product operator,�; this language may also be defined by a star-free gen-
eralised regular expression ∅a∅b∅ or a monadic first-orderlogic sentence, such as
∃x∃y(Q1(x) ∧ ¬Q2(x) ∧ Q1(y) ∧ Q2(y) ∧ x < y).
Spider diagrams of order are capable of defining precisely
the star-free regular languages [7].
1Q
2Q
a b c
d
d2
1Q
2Q
a b c
d
d2
Figure 2. A spider diagram of order.
The different characteristics of the syntax of these four
approaches to defining regular languages (diagrams, finite
automata, MFOL[<], and regular expressions) imply that
the study of each can provide unique insight into proper-
ties of the others. This has already been seen with Thomas’
work, where he proves that the level at which a star-free
language L first appears in the dot-depth hierarchy [2] is the
same as the minimum number of blocks of alternating quan-
tifiers in an MFOL[<] sentence, in prenex normal form,
which defines L. It is currently unknown how to determine
either the minimum number of quantifier blocks or the first
level in the hierarchy, for an arbitrary language L. In spi-
der diagrams of order, there is no direct analogy to prenex
normal form. The types of normal forms that arise in dia-
grammatic logics are also not analogous to those that arise
in symbolic logics and, therefore, diagrams provide a new
way of considering the level of a language in the dot-depth
hierarchy. Indeed, a characterisation of the commutative
(closed under permutation) star-free regular languages has
recently been derived by considering how spider diagrams
define such languages [6].
160160
3. Second-Order Spider Diagrams
Second-order spider diagrams extend spider diagrams of
order [5], by adding various pieces of syntax. Here, we
briefly outline the usability considerations that were given
when designing second-order spider diagrams. Similar con-
siderations for syntax design have been discussed for con-
straint diagrams [15].
First, we aim to provide the ability to quantify over sub-
set of U , the universal set. In spider diagrams of order,
particular subsets of U are represented by labelled curves,
with the label identifying the represented set; this aspect
of the notation is inherited from the underlying Euler dia-
gram. Thus, it seems sensible to use the same type of syn-
tax for representing arbitrary (not particular) subsets of U .
Hence, we use unlabelled curves to represent the existence
of a subset of U . For example, in figure 3, the unlabelled
curve asserts the existence of a subset of A. The placement
of an unlabelled curve yields constraints on the set whose
existence is asserted.
There is good reason for using curves (labelled or unla-
belled) to represent sets. They give rise to free-rides [13],
which means that the diagram conveys information that
would need to be inferred in the symbolic case. In addition,
the use of closed curves in this manner can be perceived as
being well-matched to their intended semantics [9]: the en-
closure of one curve, c1, by another, c2, corresponds to the
assertion that the set represented by c1, is a subset of the set
represented by c2.
Figure 3. A second-order unitary diagram.
Further free rides occur through the use of spiders, which
represent the existence of elements, placed within curves.
For example, in figure 4, the placement of the spider inside
A and the disjointness of A from both B and the unlabelled
curve, c, tells us, for free, that the element represented by
the spider is not in B or the set whose existence is asserted
by c. We also include two special spiders, called min and
max, to denote the minimal and maximal elements of U ,
respectively. This is merely to give easy access to these spe-
cial elements and is consistent with Thomas’ work outlined
in the previous section.
Since we want to place constraints on the ordering of
elements, it seems natural to make use of an arrow for this
purpose; arrows were used to denote properties of binary
relations in the case of constraint diagrams [11]. An arrow
Figure 4. Free-rides arising from spiders.
represents a directed relationship, which is exactly what we
have with an order relation. We take an arrow to assert that
the successors of the elements of the set represented by the
arrow’s source form the set of elements represented by the
arrow’s target; if the source or target is a spider, then that
spider is treated as a singleton set. In figure 4, the spider,
s1, inside A is the source of an arrow, a1, targeting a spider,
s2 not in A, B, or the unlabelled curve c. In turn, s2 is
also the source of an arrow, a2, whose target spider is not
in A, B, or c. For free, we see that the successor of the
successor of the element represented by s1 is not in A, B, or
c, by ‘navigating’ along the arrows. The arrow sourced on
B asserts that every element which is a successor of some
element in B is in neither A nor B, since the target c is
disjoint from both A and B.
Returning to figure 3, the diagram includes an unlabelled
curve to denote the existence of a set, E, in this case a subset
of A−B, and the arrow asserts that the image of the succes-
sor function with its domain restricted to E is the element,
e, denoted by the spider in B − A. Although there are no
spiders explicitly represented in E, we know it must, there-
fore, represent a non-empty set and moreover be a singleton
set: the elements of this set have, between them, a single
successor. The sets represented by the source and target of
a successor arrow have the same cardinality.
To summarize, our additional syntax includes unlabelled
curves to represent existential quantification over subsets,
the use of arrows to represent properties of a successor
function, and constants min and max to represent the first
and last elements of the order relation <. We now proceed
to formally define second-order spider diagrams, extending
[5].
We will start by formalising the abstract syntax of uni-tary diagrams which are combined in various ways to create
compound diagrams. We have a finite set of fixed contours,
FC, and a countable set of existential contours, EC. Each
contour in FC corresponds to a labelled curve in a diagram,
whereas the contours in EC correspond to unlabelled curves.
A zone is a pair of finite, disjoint sets of contours. Given a
diagram, d, with a finite set of contours C ⊂ FC ∪ EC, one
defines the zones of d as pairs of sets of contours (in, out),such that in ∪ out = C. Intuitively, the zone thus defined is
inside every contour of in, and outside every contour of out.
161161
The set of all zones is Z , and the set of zones of a diagram is
denoted Z. Zones may be shaded: the set of shaded zones
of a diagram is denoted Sh, where Sh ⊆ Z.
An existential spider consists of a natural number and
a set of zones. The natural number gives the spider a la-
bel, and the set of zones describes the habitat of the spi-
der. For each z in the habitat, the spider has a foot in that
zone. The set of all existential spiders is denoted ES. Fur-
thermore, we have a set of constant spiders, denoted CS,
consisting of min and max. The sets FC, EC, ES and CSare pairwise disjoint. Properties of the successor function
will be denoted by arrows, formally defined as a set of
pairs, (source, target). The source and target for each pair
is drawn from FC∪EC∪ES∪CS. The following definition
builds on that for spider diagrams of order from [5]:
Definition 1. A unitary second-order spider diagram is atuple:
d = (C, Z, Sh, S, η, SucA)
which satisfies the following:
1. C = C(d) ⊂ FC ∪ EC is a finite set of contours.2. Z = Z(d) is a finite set of zones, such that for each
zone (in, out) ∈ Z(d), in ∪ out = C(d).3. Sh = Sh(d) ⊆ Z(d) is a set of shaded zones.4. S = S(d) ⊆ ES ∪ CS is a finite set of spiders.5. The function ηd : S(d) → P(Z(d)) returns the habitat
of every spider.6. SucA = SucA(d) is a finite set of successor arrows
such that for each (s, t) ∈ SucA(d), s and t are eacheither a contour in C(d) or a spider in S(d).
In figure 3, there are two fixed contours, A and B, and
one existential contour, unlabelled in the diagram but given
the name E in the abstract syntax. There are 5 zones in
this diagram, such as ({A}, {E, B}) and one shaded zone:
({A,B}, {E}). The spider has a single foot and is de-
fined by (1, {({B}, {E, A})}). The set SucA(d) contains
one element, namely (E, (1, {({B}, {E, A})})). We extend
unitary second-order spider diagrams to compound second-order spider diagrams:
Definition 2. If d1 and d2 are unitary second-order spiderdiagrams then (d1 ∨ d2), (d1 ∧ d2), (d1 � d2) and ¬d1 arecompound second-order spider diagrams.
We have a further concept of a missing zone. Intu-
itively, a zone is missing if such a zone is possible given
the contour set of a diagram, but not present in the zones
of that diagram. The diagram in figure 3 has 3 contours,
and so 8 possible zones. Thus, there are 3 missing zones:
({E}, {A,B}), ({B, E}, {A}) and ({A,B, E}, ∅).Definition 3. Given a unitary second-order diagram d, azone (in, out) is missing from d if it is in the set {(in, C(d)−in) : in ⊆ C(d)} − Z(d), denoted MZ(d).
The semantics extend those in [5].
Definition 4. An interpretation is a 4-tuple, (U,Ψ, <, Suc),where U is some finite set, Ψ : FC∪CS → PU is a functionwhich maps fixed contours and the constant spiders (treatedas singleton sets) to subsets of U , < is a strict total orderon U and Suc is a successor function respecting <. Theconstant spiders with labels min and max are interpretedas the minimal and maximal elements of U under <. When|U | = 1, we define Ψ(min) = Ψ(max) = U and <= ∅.
In the above definition, Suc is a partial function since the
maximal element of < has no successor; all other elements
have successors. Given an interpretation, we wish to know
when it agrees with the intended meaning of the diagram; an
‘agreeing’ interpretation is called a model. As an example,
consider again figure 3. A model for this diagram is U ={1, . . . , 5}, with < being the natural order on U , Suc being
the natural successor function on U , and Ψ(A) = {1, 2}and Ψ(B) = {3}. In order to identify this interpretation
as a model, we need to interpret the existentially quantified
elements of the diagram, namely the existential contour E
and the spider s. To do so, we extend Ψ to Ψ′, mapping
these elements in an appropriate manner to subsets of U :
Ψ′(E) = {2} and Ψ′(s) = {3}.
Definition 5. Let I = (U,Ψ, <, Suc) be an interpretation,and let EC be a set of existential contours, and let S be a setof spiders. An extended interpretation J = (U, Ψ′, <, Suc)where Ψ′ : FC ∪ CS ∪ EC ∪ S → PU and:
1. for each c ∈ FC, Ψ′(c) = Ψ(c),2. for each zone (in, out) ∈ P(FC ∪ EC)× P(FC ∪ EC),
we define:
Ψ′(in, out) =⋂
c∈in
Ψ′(c) ∩⋂
c∈out
(U − Ψ′(c))
3. for each set of zones Z, we define:
Ψ′(Z) =⋃
z∈Z
Ψ′(z)
4. for each spider s ∈ S, we have |Ψ′(s)| = 1
We say that J is an extension of I .
We can then give conditions as to when an interpretation,
I , is a model for a diagram d, by extending I to interpret the
spiders and existential contours of d, denoted EC(d).
Definition 6. Let d be a unitary second-order diagram andlet I = (U,Ψ, <, Suc) be an interpretation. If there existsan extension extension J = (U,Ψ′, <, Suc) of I to EC(d)∪(S(d) ∩ ES) where the following conditions hold, then I isa model for d:
162162
1. The missing zones condition. The missing zones rep-resent the empty set, i.e.
⋃z∈MZ(d) Ψ′(d) = ∅.
2. The spiders’ distinctness condition. ∀s1, s2 ∈S(d).Ψ′(s1) = Ψ′(s2) ⇒ s1 = s2.
3. The shaded zone condition. All elements in the setsrepresented by shaded zones are represented by spi-ders: ∀z ∈ Sh(d). Ψ′(z) ⊆ ⋃
s∈S(d) Ψ′(s).
4. The spiders’ location condition. The elements rep-resented by the spiders are in the sets represented bytheir habitats: ∀s ∈ S(d).Ψ′(s) ⊆ Ψ′(ηd(s)).
5. The successor condition. Successor arrows indicatea bijection between the sets represented by the sourceand target, that is ∀(s, t) ∈ SucA(d). Suc|Ψ′(s) is abijection with image Ψ′(t).
If J makes the above conditions hold then J is a valid ex-tension of I .
We now extend the concept of a model to compound dia-
grams. In the case of the connectives ∧,∨ and ¬, the exten-
sion is obvious. However, the product operator � is more
subtle. We extend the following definition from [8]:
Definition 7. Let I1 = (U1, Ψ1, <1, Suc1) and I2 =(U2, Ψ2, <2, Suc2) be interpretations where U1 and U2 aredisjoint, and let mini, maxi be the minimum and maximumelements, respectively of Ui for i = 1, 2. The ordered sumof I1 and I2, denoted I1 + I2, is defined to be the interpre-tation I3 = (U3, Ψ3, <3, Suc3) such that:
1. U3 = U1 ∪ U2,
2. for each c ∈ FC, Ψ3(c) = Ψ1(c) ∪ Ψ2(c),
3. <3=<1 ∪ <2 ∪{(u1, u2) : u1 ∈ U1 ∧ u2 ∈ U2},
Definition 8. Let I = (U, Ψ, <, Suc) be an interpretationand let d be a compound diagram. Then I is a model for dprovided:
1. if d = d1 ∨ d2 then I models d whenever I models d1
or I models d2,
2. if d = d1 ∧ d2 then I models d whenever I models d1
and I models d2,
3. if d = ¬d1 then I models d whenever I does not modeld1, and
4. if d = d1 � d2 then I models d whenever there existinterpretations I1 and I2 such that I = I1 + I2 and I1
models d1 and I2 models d2.
Second-order spider diagrams are a direct extension of
spider diagrams of order as in [7], augmenting them with
constant spiders, min and max, arrows to talk about suc-
cessors, and existential contours. Therefore:
Theorem 1. Second-order spider diagrams are at least asexpressive as spider diagrams of order.
Previous work has shown that spider diagrams of order
are equivalent in expressive power to MFOL[<] [7]. Thus:
Corollary 2. Second-order spider diagrams are at least asexpressive as MFOL[<].
4. The Language of a Diagram
Any language that is definable by a spider diagram of or-
der will be definable by a second-order spider diagram. It is
known that the language (aa)+ is not first-order definable.
However, it is second-order definable. Consider d1 in fig-
ure 5. Given an alphabet Σ = {a, b}, we assign a to the
given contour A. Elements in the existential subset contain-
ing the constant spider max (call this subset Amax) are suc-
cessors of elements of the existential subset containing the
constant spider min (call this subset Amin). Furthermore,
the shading elsewhere in the diagram shows that, elements
in Amax can only have successors in Amin. Since we have a
bijection between the disjoint subsets Amin and Amax, and
Ψ(Amin) ∪ Ψ(Amax) = U , we deduce U has even cardi-
nality. Thus, since both are assigned the letter a, any word
in the language defined by d1 must consist of an even num-
ber of as; in other words, the language defined is (aa)+.
Omitting min and max from the diagram would change the
defined language to (aa)∗.
Figure 5. The languages (aa)+ and (aba)+.
Similar reasoning gives us that d2 in figure 5 is a dia-
gram of the language (aba)+. Further, the universal set of
any model of this diagram must have cardinality which is
divisible by 3. Again, omitting min and max would mean
that the language defined is (aba)∗.
We now formalise the notion of a language of a diagram.
To do so, we assume that the sets FC and Σ are fixed and,
moreover, that the function l : FC → PΣ is given; we call
l a letter assignment. We extend l to interpret zones, such
that for each zone (in, out)
l(in, out) =⋂
c∈in∩FCl(c)
⋂
c∈out∩FC(Σ − l(c)).
Using l, we are able to associate letters with zones. First,
we define for each letter a ∈ Σ, the set of fixed contours
163163
(fc) that ‘contain’ that letter:
fc(a) = {c ∈ FC : a ∈ l(c)}.We can think of the zone (fc(a),FC − fc(a)) as containing
a. So, fc is a function with domain Σ and codomain PFC.
For instance, if we take the mapping l(A) = {a, b} and
l(B) = {b, c} over the alphabet Σ = {a, b, c} and with
FC = {A,B} then the letter a has fc(a) = {A} and is
contained by the zone ({A}, {B}). Thus, a unitary diagram
containing the zone ({A}, {B}) with a spider placed in that
zone would assert the existence of a letter a in each word
of the language it defines. A unitary diagram containing
the zone ({A}, ∅) with a spider placed in that zone would
assert the existence of a letter a or a letter b in each word of
the language it defines. Such a diagram cannot distinguish
the letters a and b, unlike a diagram containing both the
fixed contours A and B. It is important that we are able to
distinguish each letter using some diagram: if two letters, aand b say, have the same image under fc then no diagram
can define the language a∗ for example. We require that
no two letters are placed in any zone that arises from the
function fc.
Definition 9. Given FC, Σ and a letter assignment l, wesay that l is valid if the induced function fc is injective.
From this point forward, we are assuming that a valid
letter assignment, l, has been specified. We are now able to
identify word models, extending and adapting [19]. That is,
given an interpretation, we can identify whether it models
a word, as described in section 2. This notion can then be
extended to languages.
Definition 10. Let I = (U,Ψ, <, Suc) be an interpretationand let w = w1..., wn be a word drawn from Σ∗. Then I is amodel for w iff there exists a bijection f , from the multi-set{w1, ..., wn} to U such that for each wi:
1. f(wi) is the ith element of the total ordering inducedby <
2. f(wi) ∈ Ψ(fc(wi),FC − fc(wi)).
Furthermore, I is a model for language L iff there exists aword, w, in L such that I models w.
Definition 11. Let d be a second-order spider diagram andlet L be a language. We say that L is the language of d iffthe models of d are the same as the models of L. If L is thelanguage of d then we say that d defines L.
Theorem 3. Second-order spider diagrams can define (i)all star-free regular languages, and (ii) regular languageswhich are not star-free.
Proof. For part (i) it was shown that spider diagrams of or-
der can define precisely the star-free regular languages [7].
By theorem 1, it follows that second-order spider diagrams
can also define all star-free regular languages. For part
(ii), the spider diagram d1 in figure 5 defines the language
(aa)+, which is regular but not star-free.
Corollary 4. Second-order spider diagrams are strictlymore expressive than spider diagrams of order andMFOL[<].
5. Defining Languages of Star-Height 1
We will now establish that second-order spider diagrams
can define all star-height 1 languages: a language is of star-
height one if it can be defined by the concatenation or dis-
junction of regular expressions, r1, ..., rn, that each use at
most one star. To begin, we build on the observations made
in earlier examples based on figure 5. These diagrams de-
fined the languages (aa)+ and (aba)+. With this type of
construction, we may define any language of the form w+
where w ∈ Σ∗. To define w∗, we simply omit min and maxfrom the diagram.
We can construct a diagram for w∗ using a process we
now describe. Start by drawing a diagram containing all
contours in FC with no missing zones. To this diagram,
we add existential contours, one for each letter, wi, in
w = w1 . . . wn placed in the zone (fc(wi),FC − fc(wi)),ensuring that any pair of existential contours do not have
a common zone inside them (i.e. they have pairwise dis-
joint interiors). The existential contour, Ci, arising from wi
is then joined by a successor arrow to the contour, Ci+1,
arising from wi+1 for each i < n. Finally, shade all zones
that are not inside some existential contour. The diagram dproduced in this manner is said to be constructed for w∗.
Figure 6. Constructing a diagram for (abcae)∗.
To illustrate, given FC = {A,B,C}, Σ = {a, b, c, d, e}with l(A) = {a, b, d}, l(B) = {b, c, d} and l(C) = {d, e},
164164
we can construct d that defines w∗ where w = abcae. The
process can be seen in figure 6. First, we draw a diagram
with the three contours A, B, and C and no missing zones,
shown as d1. Next, for each of the letters in w, we draw an
existential contour inside the appropriate zone of d1, giv-
ing d2. In d3, we have joined the existential contours with
arrows and then added shading to all zones outside of the
existential contours to give the required diagram d4. The
shading forces all elements to be in the sets represented by
existential contours. Moreover, only one such contour, C1
say, is not the target of an arrow. This means that, in any
non-empty model for d, the minimal element must be in
Ψ′(C1). In terms of the language of d, this implies that
each word must start with an a, since C1 is placed in the
zone assigned the letter a. This occurrence of a must be
followed by a b, since the successor arrow sourced on C1
targets an existential contour that is placed in the zone that
is assigned the letter b.
Likewise, the existential contour, C5 say, that is not the
source of an arrow must contain the maximal element, so
the words in the language end with e. It should now be
clear that any word in the language starts with an a, which
is followed by a b, then c, then a and lastly an e: abcae.
Once we reach e, we can again read an a, arising from C1.
This is because, although there is no explicit successor ar-
row sourced on C5, the only existential contour that is not
the target of a successor arrow is C1. It follows that all of
the successors of elements in C5 must be in C1. The pattern
repeats and we see that the words in the language arising
from non-empty models are: abcae, abcaeabcae, and so
forth. Thus, it should be clear that the language of d in-
cludes all words in w+. The only word in w∗ that is not
in w+ is λ, which arises from the empty model. Thus, the
diagram d constructed for w∗ defines w∗.
Definition 12. Let d be a unitary second-order diagram andlet C = (C1 . . . Cn) be a sequence of contours of d. Then Cis a connected sequence if there is an arrow (Ci, Ci+1) inSucA(d) for each i < n.
Theorem 5 (Connected Sequences). Let d be a diagramwith a connected sequence of contours C = (C1, ..., Cn). Ifeach zone, (in, out), in d that is not inside one of the Cis(i.e no Ci is in the set in) is shaded and does not contain aspider foot then any model I = (U,Ψ, <, Suc) for d ensures
1.⋃
i=1,...,n Ψ′(Ci) = U ,
2. Ψ′(Ci) ∩ Ψ′(Cj) = ∅, for each i < j ≤ n.
3. Ψ′(min) ∈ Ψ′(C1), and
4. Ψ′(max) ∈ Ψ′(Cn)
where (U,Ψ′, <, Suc) is a valid extension of I .
This theorem allows us to prove the following result:
Theorem 6. Let w be a word and let d be the diagram con-structed for w∗. Then d defines the language w∗.
In general, regular expressions can be more complex
than just formed from a word in Σ∗. We now present a
method for constructing a diagram given r∗ where r is a
star-free (not generalised) regular expression. As a simple
example, consider the regular expression r = (b|d) · (c|e)and suppose we wish to construct a diagram for r∗ using
the same fixed contour set, alphabet and letter assignment
as in the previous example. The process is similar to that
for w∗, but we must now take into account disjunction. A
construction is shown in figure 7. The diagram d2 draws an
existential contour for each letter in r, grouping those for
b and d and those for c and e using further existential con-
tours, EC and EC ′ respectively. Moving on to d3, we join
EC and EC ′ with an arrow, telling us that after we ‘read’ a
b or a d we can read a c or an e. Adding the shading to cre-
ate d4 tells us that c and e are the only successors of b and d.
A simpler diagram, d5, also defines the language r∗ in this
case. However, the process illustrated in figure 7 forms the
basis of a general construction method that we now outline.
Figure 7. Defining ((b|d) · (c|e))∗.
Let r be any regular expression constructed from Σ with-
out using the Kleene star. For simplicity, we describe a
construction when r does not involve λ or ∅. Then, r will
be of the form e1 · e2 · e3 · . . . · en where ei are regular
expressions. We start the construction of the diagram by
searching through r until we reach expressions of the form
(l1|l2| . . . |ln) · (l′1|l′2| . . . |l′m), where li, l′j ∈ Σ. For each
such expression, we create the following fragment of a di-
agram: for l1, . . . , ln, create a set of n existential contours,
call them C1, . . . , Cn, which are pairwise disjoint. We as-
sign the letter li to existential contour Ci by drawing Ci
inside a zone, z, where l(z) = {li}; Ci is the start contourand end contour of li. Create a further contour C such that,
for each i < n, Ci is drawn inside C. Shade the region in-
side C but outside C1, . . . , Cn. Next, create a set of m pair-
165165
wise disjoint contours, call them C ′1, . . . , C
′m, correspond-
ing to the letters l′1, . . . , l′m. Furthermore, these contours
should be disjoint from every contour C1, . . . , Cn, C. Cre-
ate another contour C ′ in the same manner as C, but it will
contain C ′1, . . . , C
′m. Shade the equivalent region thus cre-
ated. Finally, add an arrow from C to C ′. This should create
a fragment which looks like d1 in figure 8 (in our example
of figure 7 this gives d2). The contour C is the start contourand C ′ is the end contour of (l1|l2| . . . |ln) · (l′1|l′2| . . . |l′m).
Figure 8. The initial construction and the in-ductive construction.
Given constructions for f1, . . . , fn and g1, . . . , gm, to
create a construction for (f1| . . . |fn) · (g1| . . . |gm) we per-
form the following steps. First, add a contour, C, that con-
tains precisely the start contours of each fi then shade the
region inside C but outside each fi; C is the start contourfor (f1| . . . |fn) · (g1| . . . |gm). Similarly, create an end con-tour, C ′, for (f1| . . . |fn) · (g1| . . . |gm), enclosing precisely
the end contours of each gi. Next, add a contour F such that
for every end-contour Ce in f1, . . . , fn, we have Ce is in-
side F , shading in the same manner. Now add a contour Gsuch that for every start-contour Cs in g1, . . . , gm we have
Cs is inside G, again shading the region inside G but out-
side the other contours. Finally, add an arrow (F, G). We
see this step demonstrated in d2 in figure 8. Once we have a
construction for r, we shade all zones that are not inside any
existential contour and we say that the resulting diagram, d,
is constructed for r∗.
In our construction, we have existential contours that are
not the target of an arrow and, yet, they are identified as
successors of some other elements, since they are inside
a contour that is the target of an arrow. We say a con-
tour C in a diagram is an implicit target (resp. implicitsource) of an arrow (s, t) ∈ SucA(d) iff C is drawn in-
side t (C is drawn inside s). We note that the only con-
tour which is not a source or implicit source is the end-
contour of r. Similarly, the only contour which is not a
target or implicit target is the start contour of r. Therefore,
by shading elsewhere, the only place it is possible to go, us-
ing successor, from the end contour of r is the start contour
of r. It is the same situation as in figure 5. As a further
example, the diagram in figure 9 shows a construction for
(l1 · ((l1 · l2)|(l3 · (l4|l5)) · (l1|l2)) · l3)∗, where we have
omitted the underlying fixed contours and merely shown the
letter assignments to each existential contour; these labels
are not part of the diagram.
Figure 9. Illustrating the construction.
We have seen how to create a unitary diagram for a regu-lar expression, r∗, where r contains no star. The disjunctionand product of such diagrams can create representations oflanguages such as (r∗1 |r∗2) · r3 where each ri does not usethe Kleene star: let d1 be a diagram representing the lan-guage r∗1 , let d2 be a diagram representing the languager∗2 and let d3 be a diagram representing the language r3.The compound diagram (d1∨d2)�d3 defines the language(r∗1 |r∗2) · r3. The constructions do not give particularly el-egant diagrams and they may contain a lot of redundancy.For instance, d1 in figure 1 defines the language specifiedby the regular (not generalised) expression
((a|b|c|d)∗ · a · (a|b|c|d)∗ · a · (a|b|c|d)∗ · b · (a|b|c|d)∗)|((a|b|c|d)∗ · a · (a|b|c|d)∗ · b · (a|b|c|d)∗ · a · (a|b|c|d)∗)|((a|b|c|d)∗ · b · (a|b|c|d)∗ · a · (a|b|c|d)∗ · a · (a|b|c|d)∗)
which, when following our construction, would give rise to
a compound diagram consisting of 21 unitary parts. How-
ever, the construction allows us to prove:
Theorem 7. Any regular expression of star-height 1 can bedefined by a second-order spider diagram.
6. Conclusion
In this paper, we have introduced second-order spider di-
agrams, which include the ability to quantify over subsets
of the universal set, U . The notation has been formalised
and proved to be properly second-order in terms of its ex-
pressiveness. For instance, first-order logics cannot define
(aa)∗, which we can define using second-order spider di-
agrams. To the best of our knowledge, these are the first
formal extensions of Euler diagrams which are proved to be
second-order.
Our primary motivation for developing this logic is to
provide another mechanism through which we can study
regular languages. In order to facilitate such studies, we de-
fined the language defined by a diagram. Further, we have
begun to explore the class of regular languages definable by
second-order spider diagrams. In particular, we established
that all regular languages of star-height one are definable.
166166
We fully expect to be able to extend the construction given
in section 5 to regular expressions of star-height two and
beyond. Indeed, we conjecture that second-order spider di-
agrams can define all regular languages. In addition, spi-
der diagrams could be used to create regular expressions.
At present, we have not provided a mechanism to translate
arbitrary spider diagrams into regular expressions and this
remains the subject of future work. We also plan to use
second-order spider diagrams to study regular languages.
Differences between the manner in which regular expres-
sions and second-order spider diagrams define regular lan-
guages could well mean that these diagrams provide us with
new insight into the properties of regular languages.
Further motivation for this paper was derived from pro-
viding more understanding of how we can visually ex-
presses complex concepts using diagrams. The provision of
second-order spider diagrams has yielded an increase in ex-
pressiveness, taking us beyond the limits of first-order log-
ics. Still, there is much work to be done here. The diagram-
matic logics that exist to-date cannot axiomatise many com-
monly occurring concepts, such as the mathematical prop-
erties of a set having even cardinality or being dense, or
defining the transitive closure of a relation. It is not possi-
ble, yet, to make statements that quantify over arbitrary rela-
tions or functions defined over the universal set. For exam-
ple, statements like ∃P ∀x∀yP (x, y), where P is a second-
order two-place predicate variable, cannot yet be formally
defined using diagrams of this kind.
Extending second-order spider diagrams to allow quan-
tification over binary relations will permit a greater range
of second-order properties to be defined visually, consider-
ably increasing their expressiveness. We are exploring this
extension, further pushing the boundaries of what can be
expressed using diagrammatic logics. Indeed, it would be
a very considerable advance over the current state-of-the-
art if a diagrammatic logic with the full expressiveness of
second-order logic was developed.
Acknowledgement This research is supported by EPSRC
grants for the Defining Regular Languages with Diagrams
project [EP/H012311/1] and the Sketching Euler Diagrams
project [EP/H048480/1]. We thank Aidan Delaney, John
Taylor and the anonymous reviewers for helpful comments
on this paper.
References
[1] J. Buchi. Weak second order arithmetic and finite automata.
Zeitschrift fur mathematische Logik und Grundlagen derMathematik, 6:66–92, 1960.
[2] R. Cohen and J. Brzozowski. Dot-depth of star-free events.
Journal of Computer and System Science, 5:1–16, 1971.[3] F. Dau and P. Ekland. A diagrammatic reasoning system for
the description logic ACL. Journal of Visual Languages andComputing, 19(5):539–573, 2008.
[4] A. Delaney and G. Stapleton. On the descriptional com-
plexity of a diagrammatic notation. In 13th InternationalConference on Distributed Multimedia Systems, Visual Lan-guages and Computing, pages 195–202, 2007.
[5] A. Delaney and G. Stapleton. Spider diagrams of order. In
Visual Languages and Logic, volume 274 of CEUR, pages
27–39, 2007.[6] A. Delaney, G. Stapleton, J. Taylor, and S. Thompson. A
diagrammatic characterisation of commutative star-free reg-
ular languages. in preparation.[7] A. Delaney, J. Taylor, and S. Thompson. Spider diagrams of
order and a hierarchy of star-free regular languages. In 5thInternational Conference on the Theory and Application ofDiagrams, LNCS, pages 172–187. Springer, 2008.
[8] H.-D. Ebbinghaus and J. Flum. Finite Model Theory.
Springer-Verlag, 1991.[9] C. Gurr. Aligning syntax and semantics in formalisations
of visual languages. In Proceedings of IEEE Symposia onHuman-Centric Computing Languages and Environments,
pages 60–61. IEEE Computer Society Press, 2001.[10] J. Howse, G. Stapleton, and J. Taylor. Spider diagrams.
LMS Journal of Computation and Mathematics, 8:145–194,
2005.[11] S. Kent. Constraint diagrams: Visualizing invariants in
object oriented modelling. In Proceedings of OOPSLA97,
pages 327–341. ACM Press, October 1997.[12] A. Mateescu and A. Salomaa. Formal languages: an intro-
duction and a synopsis. In Handbook of formal languages,vol. 1: word, language, grammar, pages 1–39. Springer-
Verlag New York, Inc., New York, NY, USA, 1997.[13] A. Shimojima. Inferential and expressive capacities of
graphical representations: Survey and some generalizations.
In Proceedings of 3rd International Conference on the The-ory and Application of Diagrams, volume 2980 of LNAI,pages 18–21, Cambridge, UK, 2004. Springer.
[14] S.-J. Shin. The Logical Status of Diagrams. Cambridge
University Press, 1994.[15] G. Stapleton and A. Delaney. Evaluating and generalizing
constraint diagrams. Journal of Visual Languages and Com-puting, 19(4):499–521, 2008.
[16] G. Stapleton and J. Masthoff. Incorporating negation into
visual logics: A case study using Euler diagrams. In VisualLanguages and Computing 2007, pages 187–194. Knowl-
edge Systems Institute, 2007.[17] G. Stapleton, S. Thompson, J. Howse, and J. Taylor. The
expressiveness of spider diagrams. Journal of Logic andComputation, 14(6):857–880, December 2004.
[18] N. Swoboda and G. Allwein. Using DAG transformations to
verify Euler/Venn homogeneous and Euler/Venn FOL het-
erogeneous rules of inference. Journal on Software and Sys-tem Modeling, 3(2):136–149, 2004.
[19] W. Thomas. Classifying regular events in symbolic logic.
Journal of Computer and System Sciences, 25:360–376,
1982.[20] B. Trahtenbrot. Finite automata and the logic of monadic
predicates. Sibirskij Mat. Zhurnal, 3:103–131, 1962.
167167