State Complexity of Simple Splicingt4ng/talks/dcfs2019.pdf · 2019. 7. 18. · A. Mateescu et al....
Transcript of State Complexity of Simple Splicingt4ng/talks/dcfs2019.pdf · 2019. 7. 18. · A. Mateescu et al....
State Complexity of Simple Splicing
Lila Kari and Timothy Ng School of Computer Science, University of Waterloo
DCFS 2019, Košice, Slovakia
Splicing systems
ACG CCGTTC CGT
ATCCTCACGCCGTAGCATTTCCGTGTATGA
ATCCTCACGCGTGTATGA
T. Head (1987)
A splicing system is a 3-tuple H = (Σ,R,L) where
• Σ is a finite alphabet
• R is a set of rules
• L is the initial language
A splicing rule is a 4-tuple r = (u1, u2; u3, u4) such that if x = x1u1u2x2 and y = y1u3u4y2, then
x ⊢r y = x1u1u4y2
The language of a splicing system H = (Σ,R,L) is R*(L) where
R(L) = {w ∈ Σ | (∃ x,y ∈ L, r ∈ R) such that x ⊢r y}
and
• R0(L) = L
• Ri(L) = Ri(L) ⋃ R(Ri–1(L))
• R⇤(L) =[
i�0
Ri(L)
<latexit sha1_base64="AX9ECt9BDJCHddFlN0QNfaP81hQ=">AAACC3icbVDLSsNAFJ34rPUVdelmaBGqi5KooBuh6MaFi1rsA5o0TKaTdOhkEmcmQgndu/FX3LhQxK0/4M6/cfpYaOuBC4dz7uXee/yEUaks69tYWFxaXlnNreXXNza3ts2d3YaMU4FJHccsFi0fScIoJ3VFFSOtRBAU+Yw0/f7VyG8+ECFpzO/UICFuhEJOA4qR0pJnFmqdo9LNIbyAjk9DnCZeRqETkntoDWGtQ7XnmUWrbI0B54k9JUUwRdUzv5xujNOIcIUZkrJtW4lyMyQUxYwM804qSYJwH4WkrSlHEZFuNv5lCA+00oVBLHRxBcfq74kMRVIOIl93Rkj15Kw3Ev/z2qkKzt2M8iRVhOPJoiBlUMVwFAzsUkGwYgNNEBZU3wpxDwmElY4vr0OwZ1+eJ43jsn1Stm9Pi5XLaRw5sA8KoARscAYq4BpUQR1g8AiewSt4M56MF+Pd+Ji0LhjTmT3wB8bnD+IpmGs=</latexit>
Complexity of splicing systems
Finite Ruleset Regular Ruleset
Finite Initial Language Locally testable (T. Head, 1987)
Recursively enumerable (T. Head, Gh. Păun,
D. Pixton, 1997)
Regular Initial LanguageRegular
(K. Culik II and T. Harju, 1991)
Recurisvely enumerable (Gh. Păun, 1996)
Simple splicingA. Mateescu, Gh. Păun, G. Rozenberg, A. Salomaa (1998)
TT
ACGTACGTATACCATACTTGCTTC
ACGTACTTGCTTC
A splicing rule r is simple if r = (u1, ε; u3, ε) where u1 = u3
and |u1| = 1.
A splicing system with only simple rules is a simple splicing system.
A simple splicing system is denoted by H = (Σ,M,L) where M ⊆ Σ. Then a ∈ M means (a, ε; a, ε) is a rule in H.
Word BlendingS.K. Enaganti, L. Kari, T. N., Z. Wang (2018)
Franco, Bellamoli, Lampis (2017)
ACGTACGTATACCATACTTGCTTC ACGTACTTGCTTC
ACGTACGTATACCATACTTGCTTC ACGTACTTGCTTC
u ./ v = {↵w� |u = ↵w�1, v = �2w�,
↵,�, w, �1, �2 2 ⌃⇤}<latexit sha1_base64="pDzY0r1HhmMeGIA2FocZjFDZbxk=">AAACdHicbZHPThsxEMa9W9pC+i8thx7KYdqIqqqiaJdWKhckVC4cqSCAFKfRrOMkFrZ3Zc8SRas8AW/XWx+jl55xsgFR6EiWPn/f/GR7nBVaeUqS31H8aO3xk6frG41nz1+8fNV8/ebU56UTsitynbvzDL3UysouKdLyvHASTablWXZxsMjPLqXzKrcnNCtk3+DYqpESSMEaNK9K4Fk+JSXhEvaAVxx1MUGYBlsSAjdqCB+hXGS3yRiNwUHarpF6t3ODtIHzQNTd7Rtv2r6D3SJcWeDHamzw52c+HzRbSSdZFjwU6Uq02KqOBs1ffJiL0khLQqP3vTQpqF+hIyW0nDd46WWB4gLHshekRSN9v1oObQ7bwRnCKHdhWYKle5eo0Hg/M1noNEgTfz9bmP/LeiWNdvuVskVJ0or6oFGpgXJY/AAMlZOC9CwIFE6Fu4KYoENB4Z8aYQjp/Sc/FKc7nfRLJ/nxtbX/fTWOdfaOfWCfWMq+sX12yI5Ylwn2J3obQfQ++htvxa14u26NoxWzyf6puHMND7u39w==</latexit>
u ./ v = {↵a� |u = ↵a�1, v = �2a�,
↵,�, �1, �2 2 ⌃⇤, a 2 ⌃}<latexit sha1_base64="+wYpDOMonFZyiGHABndPTk1p/QE=">AAACf3icbVFLbxMxEPYurxJeAY69jBrxEKyi3YIEHJCqcuFYBGkrxSGadSaJVdu7sr1F0Sp/gx/WW/8LB7zZsIKWkWx98z1ke5yXSjqfppdRfOPmrdt3du727t1/8PBR//GTY1dUVtBIFKqwpzk6UtLQyEuv6LS0hDpXdJKffWr0k3OyThbmm1+VNNG4MHIuBfpATfs/K+B58cNLgnP4CLzmqMolAgaafNi1nMFzqBqtUxaoNU6zpI203f6fSAKch0TrTjquy3R+Lg3wr3Kh8furpEl3PV9P+4N0mG4KroNsCwZsW0fT/gWfFaLSZLxQ6Nw4S0s/qdF6KRSte7xyVKI4wwWNAzSoyU3qzfzW8CwwM5gXNizjYcP+nahRO7fSeXBq9Et3VWvI/2njys/fT2ppysqTEe1B80qBL6D5DJhJS8KrVQAorAx3BbFEi8KHL+uFIWRXn3wdHO8PszfD9MvbwcHhdhw7bJftsZcsY+/YAfvMjtiICfYr2oteR0kcxS/iYZy21jjaZp6yfyr+8Bue7LrO</latexit>
Descriptional complexity measures for splicing systems
• Radius (Gh. Păun, 1996)
• Size of initial language (A. Mateescu et al., 1998)
• Size of grammar (A. Mateescu et al., 1998)
• Number/length of rules (R. Loos et al., 2007)
• Size of nondeterministic finite automaton (R. Loos et al., 2007)
The state complexity of a regular language is the number of states in its minimal deterministic finite automaton.
The state complexity of an operation is the worst-case state complexity of the language resulting from the operation, as a function of the state complexity of the operands.
(2n � 1 if L is regular,
2n�2 + 1 if L is finite,<latexit sha1_base64="Q9zfbRyfmChj1ec7D/4ddUId1gg=">AAACUXicbVFNb9NAEJ2YjxYXaIAjl1FbEFJpZKcHOEZw6YFDKzVtpThE6804WXW9tnbHiMjyv+B3cWhP/R+9cAB183EobZ+00tObNzO7b9NSK8dRdNUKHj1+8nRt/Vm48fzFy832q9cnrqispL4sdGHPUuFIK0N9VqzprLQk8lTTaXr+dV4//UHWqcIc86ykYS4mRmVKCvbSqD1NUpooU0s/wzVh97vBPYzxPSZMP7lWGe5820Hl0NKk0sI2HzFJvK02e90Gdx+0ZsooJu8MEzLj1ehRezvqRAvgfRKvyHZvK9n9ddWbHY7aF8m4kFVOhqUWzg3iqORhLSwrqakJk8pRKeS5mNDAUyNycsN6kUiD77wyxqyw/hjGhXq7oxa5c7M89c5c8NTdrc3Fh2qDirPPw1qZsmIycrkoqzRygfN4cawsSdYzT4S0yt8V5VRYIdl/QuhDiO8++T456Xbi/U505NP4Akusw1vYgg8QwyfowQEcQh8k/IZr+Av/WpetPwEEwdIatFY9b+A/BBs3Sq6ydQ==</latexit>
For a simple splicing system with initial language L with state complexity n
Let H = (Σ,M,L) be a simple splicing system and let A be the DFA for L. From A, we will construct an NFA that recognizes the language of H.
For each symbol a ∈ M, construct a bridge:
Add ε-transitions: • from each state in A with outgoing transitions on a to pa,
and • from p'a to all states of A with incoming transitions on a
states with outgoing transitions on a
states with incoming transitions on a
Then perform ε-transition removal:
The new states pa and p'a disappear and collapse into transitions between states of A on a.
states with outgoing transitions on a
states with incoming transitions on a
this is the image of the transition function on a
Since we begin with an n-state DFA, this process results in an n-state NFA after removing ε-transitions.
This gives an upper bound of 2n–1 reachable (non-empty subsets) states via the subset construction.
The upper bound is reachable via the simple splicing system ({a,b,c},{c},Ln), where Ln is recognized by the following DFA
reading b may remove this state
the image of the transition function on c is the entire state set
The upper bound is lower with a finite initial language
• Consider a simple splicing system (Σ,M,I), where I is a finite language with state complexity n
• Since I is finite, its DFA A, is acyclic. Then the initial state q0 of A has no incoming transitions so the only reachable subset containing q0 is {q0}.
• Since I is finite, A must contain a sink state.
• This gives a total of 2n–2 – 1 + 2 states.
8>>>>>><
>>>>>>:
2 if k = 1,
2n� 3 if k = 2,
2n2 + 1 if k = 3 and n is even,
3 · 2n�32 + 1 if k = 3 and n is odd,
2n�2 + 1 if k � 2n�3.<latexit sha1_base64="gaMvFpK5l8cUSXQh1iyF94kHA84=">AAAC4HicjVJNb9QwEHXCVwkfXeDIZdQGhNR2lY8DXJBWcOFYJLattA4rx5lsrU3sEDsVqygnLhxACIkTP4tbfwlXvJutBN0eGMnS03vzZjxjp1UhtAmCc8e9dv3GzVtbt707d+/d3x48eHikVVNzHHNVqPokZRoLIXFshCnwpKqRlWmBx+n89VI/PsNaCyXfmUWFSclmUuSCM2Op6eA3TXEmZMttDd15ETwFavCjaUUO/hxeQuh3+0CpF8mDeEOMLsT3Lc1rxkFC1MEehBuZsQ9MZuBLH4QGPEPZO2OgPFMGLiq0tk33f0VUlnX7fXN5cKWDzvADrOS487uhR1Fm60mng91gGKwCNkG4BrujHbr343y0OJwOftFM8aZEaXjBtJ6EQWWSltVG8AI7jzYaK8bnbIYTCyUrUSft6oE6eGKZDHJV2yMNrNi/HS0rtV6Uqc0smTnVl7UleZU2aUz+ImmFrBqDkveN8qYAo2D52pCJGrkpFhYwXgt7V+CnzO7Z2D/h2SWEl0feBEfRMIyHwVu7jVekjy3ymOyQZyQkz8mIvCGHZEy4kzifnC/OVzd1P7vf3O99quusPY/IP+H+/AOrmtqP</latexit>
For a simple splicing system with initial finite language L with state complexity n over k symbols
Lemma. If a ∈ M, then im δa' contains exactly the sink state and im δa.
In other words, if a ∈ M, then reading a will take the DFA to either exactly one subset or the sink state.
To reach the upper bound for finite initial languages
• Let q1 be a state reachable directly by the initial state and no other state; this state must exist since the DFA is acyclic.
• If q1 ∈ S, then S is reachable only if it is the image of δa for some a ∈ M.
• Since there are up to 2n–3 subsets that contain q1, to reach each of these subsets, there must be one a ∈ M for each.
If k = 2• If a,b are not in M, then we just have L.
• If a,b are both in M, then we have at most two reachable subsets.
• If a ∈ M and b is not, then to maximize the number of reachable states, we must have δb(i) = i+1 and |im δa| = 2. This gives us at most 2n–3 states.
For k = 3, the upper bound is reached by ({a,b,c},{c},In) where In is recognized by the DFA below.
reading c reaches this subset of states
A splicing rule r is semi-simple if r = (u1, ε; u3, ε) with |u1| = |u3| = 1.
A splicing system with only simple rules is a semi-simple splicing system.
A semi-simple splicing system is denoted by H = (Σ,M(2),L) where M(2) ⊆ Σ×Σ. Then (a,b) ∈ M(2) means (a, ε; b, ε) is a rule in H.
Semi-simple splicingE. Goode and D. Pixton (2001)
For each pair (a,b) ∈ M(2), construct a bridge:
Add ε-transitions: • from each state in A with outgoing transitions on a to pa,
and • from p'b to all states of A with incoming transitions on b
states with outgoing transitions on a
states with incoming transitions on b
b
Then perform ε-transition removal:
The new states pa and p'b disappear and collapse into transitions between states of A on a.
states with outgoing transitions on a
states with incoming transitions on b
this is the image of the transition function on b
This construction shows that semi-simple splicing systems with regular and finite initial languages have the same upper bound for state complexity.
For semi-simple splicing systems with a regular initial language, this upper bound is reached by the same lower bound witness for simple splicing systems.
The upper bound for semi-simple splicing systems with a finite initial language can be reached via ({a,b,c},{(a,c)},In), where In is recognized by the following DFA
reading a adds this state
CrossoverA. Mateescu et al. (1998), R. Ceterchi (2006)
For M ⊆ Σ×Σ we define the operation on two strings u,v by �
if u = u'a and v = bv' for (a,b) ∈ M and u’,v' ∈ Σ; and is undefined otherwise.
The crossover operation can be defined in terms of this operation extended to languages by
�
u ⇧M v = u0av0<latexit sha1_base64="qJveZEhGefH0tOB9dcwlOicc1gg=">AAAB/3icbVDLSsNAFJ3UV62vqODGzWCRuiqJCroRim7cCBXsA9oQJpNJO3RmEmYmhRK78FfcuFDErb/hzr9x2mahrQcuHM65l3vvCRJGlXacb6uwtLyyulZcL21sbm3v2Lt7TRWnEpMGjlks2wFShFFBGppqRtqJJIgHjLSCwc3Ebw2JVDQWD3qUEI+jnqARxUgbybcPUtgNKeKxCP07OIRXMK2gYcW3y07VmQIuEjcnZZCj7ttf3TDGKSdCY4aU6rhOor0MSU0xI+NSN1UkQXiAeqRjqECcKC+b3j+Gx0YJYRRLU0LDqfp7IkNcqREPTCdHuq/mvYn4n9dJdXTpZVQkqSYCzxZFKYM6hpMwYEglwZqNDEFYUnMrxH0kEdYmspIJwZ1/eZE0T6vuWdW9Py/XrvM4iuAQHIET4IILUAO3oA4aAINH8AxewZv1ZL1Y79bHrLVg5TP74A+szx/YtJSw</latexit>
L1]ML2 = pref(L1) ⇧M su↵(L2)<latexit sha1_base64="5nY8eEYMiAzI3P9LmZpJfy5QT10=">AAACOHicbVDLSgMxFM3UV62vqks3wSK0mzJTBd0IRTcuKlawD+iUIZPJtKGZZEgyQhn6WW78DHfixoUibv0C08fCtl4IHM45997c48eMKm3br1ZmZXVtfSO7mdva3tndy+8fNJVIJCYNLJiQbR8pwignDU01I+1YEhT5jLT8wfVYbz0SqajgD3oYk26EepyGFCNtKC9/V/Mc6Ko+krF3C2teBV5CV8REIi0kRxFJzbhwVDS2EnQDiiLBA+Oc96gknHgqJS9fsMv2pOAycGagAGZV9/IvbiBwEhGuMUNKdRw71t0USU0xI6OcmygSIzxAPdIxcLxOddPJ4SN4YpgAhkKaxzWcsH87UhQpNYx844yQ7qtFbUz+p3USHV50U8rjRBOOp4vChEEt4DhFGFBJsGZDAxCW1PwVYpMhwtpknTMhOIsnL4Nmpeyclp37s0L1ahZHFhyBY1AEDjgHVXAD6qABMHgCb+ADfFrP1rv1ZX1PrRlr1nMI5sr6+QUJ5KwC</latexit>
The crossover operation is used for the algebraic characterization of simple and semi-simple splicing.
The operation can be thought of as a single step of simple or semi-simple splicing.
For two DFAs A and B, if (a,b) ∈ M, then for each state of A with outgoing transitions on a, add transitions on a to all states in B with incoming transitions on b.
states of A with outgoing transitions on a
states of B with incoming transitions on b
this is the image of the transition function of B on b
This gives at most m×2n states.
A B
Am
Bn
M = {(d,d)}
Conclusion
• State complexity for simple splicing systems with regular initial languages
• State complexity for simple splicing systems with finite initial languages defined over alphabets of size 1, 2, 3, and ≥2n–3
• State complexity of semi-simple splicing systems with regular and finite initial languages
• State complexity of the crossover operation on regular languages
Open problems
• State complexity for other simple and semi-simple splicing systems (2,4; 2,3; 1,4) with finite and regular initial languages.
• State complexity of simple splicing systems with finite initial languages over alphabets of size k for 3 < k < 2n–
3.
• State complexity of k-limited splicing systems, for k = 1, 2, ….
Thank you