Tree-Walking Automata and Monadic Second Order Logic
Preface
This report describes the results of a study in the
field of theoretical computer science. It was written as my
``doctoraalscriptie'' (Master's thesis)
for my study of Computer Science at Leiden University.
The graduation tutor from the Department of Computer Science
was J. Engelfriet, PhD.
This report was written using the LATEX typesetting software on a computer
running Linux and the KDE desktop environment.
I would like to thank Joost Engelfriet for his constant advice and
his help in conducting this study. My parents have always supported me in many
ways. And, last but certainly not least, thanks to Ellen for the title page
she designed and for her love and support.
Jan-Pascal
Delft, July 1998
Contents
Preface
Introduction
1 Binary MSO Formulas and String-Walking Pebble Automata
1.1 Preliminaries
1.2 String-Walking Automata
1.3 String-Walking Pebble Automata
1.4 String-Walking Automata with MSO Tests
1.5 From MSO Formulas to String-Walking Pebble Automata
1.6 Another proof
1.7 Pebble Necessity
1.8 From String-Walking Pebble Automata to MSO Formulas
2 Extensions to String-Walking Pebble Automata
2.1 String-Walking Multi-Pebble Automata
2.2 From String-Walking Multi-Pebble Automata to MSO Formulas
2.3 String-Walking Marble Automata
3 Binary MSO Formulas and Tree-Walking Automata
3.1 Preliminaries
3.2 Tree-Walking Automata
3.3 Push-Down Tree-Walking Automata
3.4 Equivalence of Finite Tree Automata and Push-Down Tree-Walking Automata
3.5 Tree-Walking Marble Automata
3.6 Tree-Walking Marble/Pebble Automata
3.7 Tree-Walking Automata with MSO Tests
3.8 Equivalence of Binary MSO Formulas and Tree-Walking Marble/Pebble Automata
Conclusion and Recommendations
Introduction
Monadic second order logic (MSO logic, [Büc60,,TW68])
is a way to describe properties of graphs. In this paper we will focus on
strings and trees as special cases. Monadic second order logic
combines great strength
with great ease of expression.
Closed MSO formulas define sets of trees, while MSO formulas
with one or more unbound variables define node relations on trees.
One perhaps less desirable property of MSO logic is that its formulas
are global in a way that they define properties of trees as a whole.
Another way to define node relations is by tree-walking automata.
A tree-walking automaton can compute a binary node relation by starting
on one node of the tree and finishing on another node. It has already
been shown that tree-walking automata that can test unary MSO formulas
at each node they visit, compute the same node relations that binary MSO
formulas define [BE97]. However, these unary MSO formulas define in a way
also global properties of the tree, in that they can test, e.g.,
the labels of several nodes of the tree they walk on in one formula.
It can also be shown that ordinary tree-walking automata do not compute
the same node relations that binary MSO formulas recognize. Therefore,
in order to compute the node relations defined by binary MSO formulas,
we need to extend tree-walking automata with features that make them
more powerful. We need to make sure that these features result in only
local operations, i.e.,
the resulting automata can, during their walk on a tree,
only test properties of the current node. Also, we would like these
operations to be more operational and less descriptive than unary MSO
formulas.
These considerations lead us to the central question of this paper.
Is it possible to define a type of tree-walking automaton,
with only local operations,
that computes the same binary node relations that binary MSO formulas
recognize?
In this paper, we try to find an answer to this question. We start by
attempting to find an answer for strings in Chapter 1.
In this chapter, we introduce the concept of pebbles . During the walk of
a string- or tree-walking automaton, the automaton can place a pebble on the current
node. Later during its walk, the automaton can check for the presence of a
pebble and pick it up. Variations are possible in the allowed number of
pebbles, coloured pebbles and restriction in the use of pebbles. It is shown
that string-walking pebble automata (with just one pebble) compute
exactly the binary relations that binary MSO formulas define on strings.
In the next chapter, we define some more powerful string-walking automata
and show some of their properties.
In Chapter 3, we move on to trees.
The task of finding a proper extension for tree-walking automata proves to
be a bit harder than for string-walking automata. Because it is already
known [KS81] that push-down tree-walking automata recognize
exactly the regular tree languages (which are the tree languages
defined by closed MSO formulas), we develop
tree-walking marble automata, similar in concept to push-down tree-walking
automata, that also recognize exactly the regular tree languages but, unlike
push-down tree-walking automata, allow for a natural definition of the
computation of node relations. By adding one pebble, we obtain tree-walking
marble/pebble automata. We show that these
compute exactly the binary node relations that binary MSO formulas
define on trees.
In the last chapter, we conclude this paper and give some recommendations for
future research.
Chapter 1
Binary MSO Formulas and String-Walking Pebble Automata
1.1 Preliminaries
In this section we recall some well-known concepts from
formal language theory [HU79,], more specifically
concerning strings, finite automata and monadic second order logic (on strings).
Before doing so, we recall some standard
terminology from set theory.
The set of natural numbers is \mathbb N = {0,1,2,¼}
and, for m,n Î \mathbb N,
[m,n] = { i Î \mathbb N \mid m £ i £ n}.
For a set S, 2S is its powerset, i.e., the set of all subsets of S.
Let R Í A ×B be a binary relation.
The transitive reflexive closure of R is denoted R*.
The inverse of R is R-1 = { (y,x) \mid (x,y) Î R }.
For two binary relations R1 and R2, their composition is
R1 °R2 = {(x,z) \mid $y: (x,y) Î R1 and (y,z) Î R2 }. Note that the order of R1 and R2 is nonstandard.
For each a Î A,
R(a) = { b Î B \mid (a,b) Î R } and for
each S Í A, R(S) = È{ R(a) \mid a Î S }.
The domain of R is dom(R) = R-1(B).
A binary relation R is functional if it is a partial
function, i.e., (x,y),(x,z) Î R
implies x = z.
Let f:A® B be a function; for
any a and b,
f(a® b) denotes the ``perturbed'' function
f¢:A È{a} ® B È{b} with
f¢(a) = b and f¢(a¢) = f(a¢) for every a¢ Î A with a¢ ¹ a.
Strings
An alphabet is a finite set of symbols. Let S be an alphabet.
A string over S
is a finite string of symbols from S.
The empty string is denoted l.
The set of all strings over S is denoted S*. A subset of
S* is called a language.
If w is a string over S,
|w| denotes the length of w. The set of positions in w is denoted by
Vw = [1,|w|].
For a string w Î S* and i Î Vw, we write labw(i) = s
if the ith symbol of w is s.
For k Î \mathbb N, a k-ary position relation
over S is a subset of
{(w,u1,¼,uk) \mid w Î S* and ui Î Vw for all i Î [1,k] }. A k-ary position relation associates with each string
w a k-ary relation on the positions of w.
We define marked strings as follows. Let S be an alphabet,
and k ³ 1. We define Bk = {0,1}k \{0}k.
The alphabet
SÈ(S×Bk) contains all symbols s Î S and the symbols
(s, b1, ¼, bk) with bi Î {0,1} for all i Î [1,k],
but without (s, 0, ¼, 0). The role of the latter symbol is played
by s itself. This alphabet is used to attach k different marks
to the positions in a string.
Let w Î S* be a string over S, and let u1, ¼, uk Î Vw.
The marked string w¢ = mark(w, u1, ¼, uk) Î (SÈ(S×Bk))*
is the string with labw¢(u) = labw(u) if u ¹ ui for all
i Î [1,k], and labw¢(u) = (labw(u), (u = u1), ¼, (u = uk))
otherwise (where (u = ui) = 1 iff u equals ui).
Finite Automata
Let S be an alphabet. A finite automaton over S is
a quintuple M = (Q,S,d,I,F), where Q is a finite set of
states, S is the input alphabet, d Í Q ×S×Q is the transition relation, I Í Q is the set of
initial states and F Í Q is the set of final states.
The elements of d are called transitions.
For every string w Î S*, the state transition relation
RM(w) Í Q ×Q is defined as follows.
For s Î S,
RM(s) = { (p,q) \mid (p,s,q) Î d}. For
s1,¼,sn Î S,
RM(s1¼sn) = RM(s1)°¼°RM(sn). The language recognized by M is
L(M) = { w Î S* \mid (qin,qfin) Î RM(w) for some qin Î I, qfin Î F }.
A language is regular if it is recognized by a finite automaton.
The class of all regular languages is named REG.
Monadic Second Order Logic (on strings)
Monadic second order logic can be used to describe properties
of strings [Büc60]. For an alphabet S, we define the language
MSOL(S) of MSO formulas over S. This language has position
variables x,y,¼ and position-set variables X,Y,¼. For a given
string w Î S*, position variables range over Vw = [1,|w|] and
position-set variables range over the subsets of Vw.
There are three types of atomic formulas in MSOL(S): labs(x), for every
s Î S, signifies that at position x there
is a symbol s. When MSO logic is expanded to describe properties
of trees and graphs, labs(x) means that node x is labeled
s, hence the name labs. The atomic formula
pre(x,y) signifies that position x comes directly before y,
i.e. y = x+1; and x Î X signifies that x is an element of X. The
connectives used to build formulas from these atomic formulas are
Ø, Ù, Ú, ®, «, with the usual meaning.
Both position variables and position-set variables can be quantified with
$ and ".
We will make use of the following abbreviations of MSO formulas:
If an MSO formula f has free variables, say x, y, X and no others,
we write f(x,y,X) to indicate the free variables of f.
For every k Î \mathbb N, the set of MSO formulas over S with k
free position variables and no free position-set variables is denoted
MSOLk(S), or the set of k-ary MSO formulas.
For a closed formula f Î MSOL0(S) and
a string w Î S*, we write w \models f if w satisfies f.
The language defined by f is
L(f) = { w Î S* \mid w \models f}. L(f) is called
an MSO definable language .
The class of all MSO definable languages is named MSO.
Given a string w, a valuation function n is a function that maps
each position variable to a position u Î Vw and each position-set
variable to a subset U of Vw. Let f be an MSO formula. We write
(w,n) \models f, if f holds in w, where the free variables
of f are assigned values according to the valuation function n.
If f has free variables x,y,X, we write (w,u,v,U) \models f(x,y,X) for
(w,n) \models f, where n(x) = u, n(y) = v and n(X) = U.
Let f(x1, ¼, xk) Î MSOLk(S) be an MSO formula
with k free positions variables (and no free position-set variables).
For each string w Î S*, the k-ary relation
that f defines on the positions of w is
|
Rw(f) = { (u1, ¼, uk) Î Vwk \mid (w, u1, ¼, uk) \models f(x1, ¼, xk) } . |
|
The MSO formula f(x1, ¼, xk) defines the position relation
|
R(f) = { (w, u1, ¼, uk) \mid w Î S*, (u1, ¼, uk) Î Rw(f) } . |
|
Büchi [Büc60] proved the following classical result.
Proposition 1
A language is MSO definable if and only if it is regular.
In [BE97], the following (well-known)
results are proven for MSO formulas on trees.
They also hold for the special case of strings.
Corollary 3 is
the result of Lemma 2 in the case j = k.
Lemma 2
Let S be an alphabet, k ³ 1, and j Î [1,k]. For every formula
f(x1, ¼, xk) Î MSOLk(S) there is a formula
y(xj+1, ¼, xk) Î MSOLk-j(SÈ(S×Bj)) such that,
for all w Î S* and u1, ¼, uk Î Vw,
|
(w, u1, ¼, uk) \models f(x1, ¼, xk) iff (mark(w, u1, ¼, uj), uj+1, ¼, uk) \modelsy(xj+1, ¼, xk), |
|
and vice versa, i.e.,
for every formula y(xj+1, ¼, xk) Î MSOLk-j(SÈ(S×Bj))
there is a formula f(x1, ¼, xk) Î MSOLk(S)
such that the above equivalence holds
for all w Î S* and u1, ¼, uk Î Vw.
Corollary 3
Let S be an alphabet and k ³ 1. For every formula
f(x1, ¼, xk) Î MSOLk(S) there is a formula
y Î MSOL0(SÈ(S×Bk)) such that,
for all w Î S* and u1, ¼, uk Î Vw,
|
(w, u1, ¼, uk) \models f(x1, ¼, xk) iff mark(w, u1, ¼, uk) \models y, |
|
and vice versa, i.e.,
for every MSO formula y Î MSOL0(SÈ(S×Bk))
there is a formula f(x1, ¼, xk) Î MSOLk(S)
such that the above equivalence holds
for all w Î S* and u1, ¼, uk Î Vw.
From Corollary 3 and Proposition 1
follows the following lemma.
Lemma 4
Let S be an alphabet and let k ³ 1. For every MSO formula
f(x1, ¼, xk) Î MSOLk(S) there exists a finite automaton
M over SÈ(S×Bk) such that, for all w Î S* and
u1, ¼, uk Î Vw,
|
(w, u1, ¼, uk) \models f(x1, ¼, xk) iff mark(w, u1, ¼, uk) Î L(M) . |
|
1.2 String-Walking Automata
String-walking automata are known in the
literature as two-way finite automata (see [HU79]).
We will call them string-walking automata to stress both the differences and
the similarities with tree-walking automata.
String-walking automata are like the well-known finite automata,
except that they can walk both forwards and backwards along the string
they are examining (hence the name two-way finite automata). At any moment
in the execution of a string-walking automaton, the automaton is in one
state and at one position of the string.
Let S be an alphabet. Syntactically, a string-walking automaton
over S is a
finite automaton
with a special set of directives as input alphabet.
We define the set of directives as
|
DSWA(S) = { ¬, ®, head, Øhead, tail, Øtail} È{ labs\mid s Î S} . |
|
The meaning of the directives is as follows:
- ¬ means go back to the previous position in the input string.
- ® means go to the next position in the input string.
- head checks whether the current position is the first position in the
string.
- tail checks whether the current position is the last
position in the string.
- Øhead and Øtail are the negations of
head and tail respectively.
- labs checks whether the current position is
labeled with s.
Formally, we define for each string w Î S* and
each directive d Î DSWA(S)
the following binary relation Rw(d) on Vw:
|
| |
|
|
|
{ (i, i-1) \mid 1 < i £ |w| } |
| |
|
|
{ (i, i+1) \mid 1 £ i < |w| } |
| |
|
|
{ (i, i) \mid 1 £ i £ |w| and the ith symbol of w is s } |
| |
|
| |
|
|
{ (i, i) \mid 1 < i £ |w| } |
| |
|
| |
|
| { (i, i) \mid1 £ i < |w| } |
|
| |
|
A string-walking automaton over S is a finite automaton A
over DSWA(S). For a string-walking automaton
A = (Q,DSWA(S), d, I, F) and a string w Î S*,
an element (q,u) of Q ×Vw is
a configuration of the automaton.
The configuration (q,u) Î Q ×Vw signifies that A is in state
q at position u. We say that the current position of
the automaton on w is u. The set of all configurations
of A and w is denoted by \mathbb CA,w.
Let w Î S*. One step of A = (Q,DSWA(S),d,I,F) on w
is defined by the binary relation \twoheadrightarrowA,w on the set
of configurations, as follows. For every
(q,u), (q¢,u¢) Î \mathbb CA,w,
|
(q,u) \twoheadrightarrowA,w (q¢,u¢) iff $d Î DSWA(S): (q,d,q¢) Î d and (u,u¢) Î Rw(d) . |
|
For each string w Î S*, A computes the binary relation
|
Rw(A) = { (u,v) Î Vw ×Vw \mid (q,u) \twoheadrightarrowA,w*(q¢,v) for some q Î I and q¢ Î F } . |
|
Thus, A computes the position relation
|
R(A) = { (w,u,v) \mid w Î S*, (u,v) Î Rw(A) } . |
|
We define the language that A recognizes as all strings w with
the property that there is a walk of A on w starting at the beginning of the
string in an initial state and ending in a final
state, or
|
L(A) = { w Î S* \mid there exists v Î Vw such that (1,v) Î Rw(A)} . |
|
The class of all languages recognized by a string-walking automaton
is named SWA.
In the following we will also allow transitions of the form
(p,s,q) with s Î DSWA(S)*. If s = d1 d2 ¼dn,
we define, for all w Î S*,
|
Rw(s) = Rw(d1) °¼°Rw(dn). |
|
These extended directives can easily be implemented by adding extra
transitions and states.
A string-walking automaton A = (Q,DSWA(S),d,I,F) over S
is deterministic if the following three conditions hold:
- #I = 1.
- If (p,d,q) Î d, then p \not Î F.
- For all distinct transitions
(p,d1,q1), (p,d2,q2) Î d,
d1 and d2 are two mutually exclusive directives in
DSWA(S).
Two directives d1,d2 Î DSWA(S) are mutually exclusive exactly if,
for all w Î S* and u Î Vw,
dom(Rw(d1)) Çdom(Rw(d2)) = Æ.
The following pairs of directives in DSWA(S) are mutually exclusive:
- {®,tail},{¬,head}
- {head,Øhead},{tail,Øtail}
- {labs1,labs2} with
s1 ¹ s2
1.3 String-Walking Pebble Automata
We will extend the concept of string-walking automata with a pebble [BH65].
A string-walking pebble automaton
is able to drop its pebble at its current position in the string,
pick it up at any (later) time, if it is at the same position again, and check
whether or not the pebble is present at the automaton's current position.
Let S be an alphabet.
The directives for a string-walking pebble automaton are
|
| |
|
|
|
DSWA(S)È{ put, lift, here, Øhere} |
| |
|
| { ¬, ®, head, Øhead, tail, Øtail, put, lift, here, Øhere} È{ labs\mid s Î S} |
|
| |
|
The meaning of the new directives is as follows:
- put means drop the pebble at the current position in the string.
- lift means lift the pebble from the current position
in the string (only if it is there now).
- here checks whether the pebble is at the current position.
- Øhere is the negation of here.
For all strings w Î S*, we will denote the
pebble positions with elements of Pw = Vw È{^}.
Here ^ denotes that the pebble is not placed at any position of
the string, i.e., the automaton ``carries'' the pebble.
We now need to extend the binary relations Rw(d) (for d Î DSWA(S)) to be
on pairs (u,p) Î Vw ×Pw to take pebble-handling into account.
Let, for all strings w Î S* and directives d Î DSWA(S),
[R\tilde]w(d) denote the binary relation Rw(d) as defined for string-walking
automata in the previous section. We now define Rw(d) for string-walking
pebble automata as follows:
|
Rw(d) = { ((i,p),(i¢,p)) \mid (i,i¢) Î |
~ R
|
w
|
(d) and p Î Pw } . |
|
This means that the directives from DSWA(S) do not alter the pebble position.
The relations for the new directives are as follows.
|
| |
|
|
|
{ ((i,^), (i, i)) \mid 1 £ i £ |w|} |
| |
|
|
{ ((i, i), (i, ^)) \mid 1 £ i £ |w|} |
| |
|
|
{ ((i, i), (i, i)) \mid 1 £ i £ |w|} |
| |
|
| { ((i,p), (i,p)) \mid1 £ i £ |w| and p ¹ i } |
|
| |
|
Let S be an alphabet.
A string-walking pebble automaton is a finite automaton A
over DSWPA(S). For a string-walking pebble automaton
A = (Q,DSWPA(S), d, I, F) and a string w Î S*,
an element (q,u,p) Î Q ×Vw ×Pw is
a configuration of the automaton. The set of all configurations
of A and w is denoted by \mathbb CA,w.
A configuration (q,u,p) signifies that A is in state
q at position u, while the pebble is at position p.
The step relation \twoheadrightarrowA,w is extended in the obvious way. For a
string-walking pebble automaton A, a string w Î S* and
configurations (q,u,p),(q¢,u¢,p¢) Î \mathbb CA,w,
|
(q,u,p) \twoheadrightarrowA,w* (q¢,u¢,p¢) iff $d Î DSWPA(S): (q,d,q¢) Î d and ( (u,p),(u¢,p¢) ) Î Rw(d). |
|
For the relation that A computes, we now demand that
A begins and ends with the pebble not on the string, or ``in its pocket''. For
w Î S*,
|
Rw(A) = { (u,v) Î Vw ×Vw \mid (q,u,^) \twoheadrightarrowA,w*(q¢,v,^) for some q Î I and q¢ Î F }. |
|
The definitions of R(A) and L(A) remain the same.
The class of all languages recognized by a string-walking
pebble automaton is named SWPA.
Transitions of the form (p,s,q) with s Î DSWPA(S)* are treated as
with string-walking automata.
The definition of a deterministic string-walking pebble
automaton is similar to that of a deterministic string-walking automaton.
The only difference is that the directives are now taken from DSWPA(S).
The following pairs of directives in DSWPA(S) are mutually
exclusive:
- {®,tail},{¬,head}
- {here,put},{lift,put},{Øhere,lift}
- {head,Øhead},{tail,Øtail},{here,Øhere}
- {labs1,labs2} with
s1 ¹ s2
1.4 String-Walking Automata with MSO Tests
Another way to extend string-walking automata
(as done in [BE97] for tree-walking
automata, see also Section 3.7) is to allow the automaton
to test any MSO definable property of the current position of the string,
using MSO formulas with one free position variable. This extension also
requires the introduction of new directives.
Let S be an alphabet. The set of directives over S is
|
DSWA+M(S) = { ¬, ®} ÈMSOL1(S) . |
|
Note that head, tail, labs, etc. are MSO definable properties
so they need not be explicitely stated in the definition
of DSWA+M(S). Note also that DSWA+M(S) is an infinite set. Therefore we need to define
for each string-walking automaton with MSO tests a finite subset of
DSWA+M(S) as the directives used by the automaton.
For the directive d = y(x) with y Î MSOL1(S),
we define the following binary relation on Vw (w Î S*):
|
Rw(y(x)) = {(i,i) \mid (w,i) \models y(x) } . |
|
A string-walking automaton with MSO tests over S is
a finite automaton A over a finite subset of DSWA+M(S).
The definitions of configurations, \twoheadrightarrowA,w, Rw(A), etc. are
unchanged with respect to the definitions for (ordinary) string-walking automata.
The definitions of deterministic string-walking automata
and mutually exclusive directives are also unchanged, although
it is no longer possible to list all mutually exclusive pairs
of directives. Note that it is decidable whether a given pair of directives
in MSOL1(S) is mutually exclusive.
The class of all languages
recognized by a string-walking automaton with MSO tests is named
SWA+M.
In [BE97] the following results are proven for trees. They also hold for
the special case of strings.
Proposition 5
The following three statements hold:
- SWA+M=REG
- For every alphabet S and every binary MSO formula
f(x,y) Î MSOL2(S), there exists a string-walking automaton
with MSO tests A such that R(A) = R(f).
- For every alphabet S and every string-walking automaton
with MSO tests A over S, there exists a
binary MSO formula f(x,y) Î MSOL2(S)
such that R(f) = R(A).
1.5 From MSO Formulas to
String-Walking Pebble Automata
Both MSO formulas with two position variables and string-walking pebble
automata compute binary position relations. In this section, we show that
every binary position relation that can be defined by an MSO formula,
can also be computed by a string-walking pebble automaton. In
Section 1.8 we will show that the reverse is also true, i.e.,
every binary position relation that can be computed by a string-walking
pebble automaton can also be defined by a binary MSO formula.
Let S be an alphabet and let f(x,y) be a binary MSO formula
over S. We have seen (Lemma 4) that there
exists a finite automaton M over SÈ(S×B2)
such that, for all w Î S* and u,v Î Vw,
|
(w,u,v) \models f(x,y) iff mark(w,u,v) Î L(M) . |
|
The following lemma states that there is a string-walking pebble automaton
that computes the same relation as f(x,y) defines.
Lemma 6
For every finite automaton M over SÈ(S×B2)
there exists a string-walking pebble automaton A
over S such that, for all w Î S* and u,v Î Vw,
|
mark(w,u,v) Î L(M) iff (u,v) Î Rw(A). |
|
Proof.
Let M = (QM,SÈ(S×B2), dM, IM, FM) be a finite
automaton over SÈ(S×B2). We will define a
string-walking pebble automaton over S,
A, that simulates M, using the pebble to simulate the marked
positions in M. A is defined as the (intuitive) union
of three automata A1, A2, A3.
We define A as (Q1 ÈQ2 ÈQ3, DSWPA(S), d1 Èd2 Èd3, I1 ÈI2 ÈI3, F1 ÈF2 ÈF3).
A1 deals with the case that u comes ``before'' v in w, A2
deals with the case that v comes first and A3 deals with the case
u = v. For 1 £ i £ 3, Ai = (Qi, DSWPA(S), di, Ii, Fi).
A1 simulates M, assuming that u comes before v in w. A1 first
drops its pebble at the start position u.
This makes it possible to determine
later where the automaton started. Then it walks to the head of the string.
Next it simulates M in three stages.
In the first stage of the simulation,
it simulates M (only taking into account transitions of the form
(p,s,q) with s Î S)
until it finds the pebble. It picks up the pebble and treats the symbol
s on this position as (s,1,0) in M. The automaton
now continues simulating M (the second stage of the simulation),
again without transitions with labels in S×B2.
Then, nondeterministically, A1 drops the pebble at some position
v and treats the symbol s at this position as
(s,0,1) in M. A continues simulating M (the third stage)
until it reaches the end of the string. If it reaches the end of the
string in a final state of M, the choice for v was a good choice.
The automaton then backs up until it finds the pebble at position
v. This ends A1.
We use the following states in A1.
- in1 is the initial state.
- tobegin1 is used to walk to the head of the string.
- q1,i (for q Î QM) is used in the ith
stage of the simulation.
- backup1 is used to back up to v after the three
stages of simulation.
- final1 is the final state.
Formally, A1 = (Q1, DSWPA(S), d1, I1, F1) is constructed as follows:
|
| |
|
|
|
{ in1, tobegin1, backup1, final1 } È |
| |
|
|
{ q1,i \mid q Î QM, 1 £ i £ 3 } |
| |
|
|
{ (in1, put, tobegin1),(tobegin1, ¬, tobegin1) } È |
| |
|
|
{ (tobegin1, head, q1,1) \mid q Î IM } È |
| |
|
|
{ (p1,1, (labs, ®), q1,1)\mid (p,s,q) Î dM } È |
| |
|
|
{ (p1,1, (lift, labs, ®), q1,2) \mid(p,(s,1,0),q) Î dM } È |
| |
|
|
{ (p1,2, (labs, ®), q1,2)\mid (p,s,q) Î dM } È |
| |
|
|
{ (p1,2, (put, labs, ®), q1,3) \mid(p,(s,0,1),q) Î dM } È |
| |
|
|
{ (p1,2, (put, labs, tail), backup1) \mid(p,(s,0,1),qfin) Î dM for some qfin Î FM } È |
| |
|
|
{ (p1,3, (labs, ®), q1,3)\mid (p,s,q) Î dM } È |
| |
|
|
{ (p1,3, (labs, tail), backup1)\mid (p,s,qfin) Î dM for some qfin Î FM } È |
| |
|
|
{ (backup1, ¬, backup1), (backup1, lift, final1) } |
| |
|
| |
|
|
| |
|
A2 is similar to A1, only here the automaton, after dropping its
pebble on u, first walks to the right until the
end of the string, then simulates M (inverted), starting in a state
corresponding to a state in FM, until it finds the pebble
at u. It picks up the pebble and treats the symbol s
at this position as (s,0,1) in M.
A2 continues with the second stage of the simulation.
Then, nondeterministically, A2 drops the pebble at some position v
and continues with the third stage of the simulation until the beginning of
the string. If it reaches the beginning of the string in an initial
state of M, A2 walks back to the pebble at v and the
simulation is finished.
Formally, A2 = (Q2, DSWPA(S), d2, I2, F2) is constructed as follows:
|
| |
|
|
|
{ in2, toend2, backup2, final2 } È |
| |
|
|
{ q2,i \mid q Î QM, 1 £ i £ 3 } |
| |
|
|
{ (in2, put, toend2),(toend2, ®, toend2) } È |
| |
|
|
{ (toend2, tail, q2,1) \mid q Î FM } È |
| |
|
|
{ (q2,1, (labs, ¬), p2,1)\mid (p,s,q) Î dM } È |
| |
|
|
{ (q2,1, (lift, labs, ¬), p2,2) \mid(p,(s,1,0),q) Î dM } È |
| |
|
|
{ (q2,2, (labs, ¬), p2,2)\mid (p,s,q) Î dM } È |
| |
|
|
{ (q2,2, (put, labs, ¬), p2,3) \mid(p,(s,0,1),q) Î dM } È |
| |
|
|
{ (q2,2, (put, labs, head), backup2) \mid(p,(s,0,1),qin) Î dM for some qin Î IM } È |
| |
|
|
{ (q2,3, (labs, ¬), p2,3)\mid (p,s,q) Î dM } È |
| |
|
|
{ (q2,3, (labs, head), backup2)\mid (p,s,qin) Î dM for some qin Î IM } È |
| |
|
|
{ (backup2, ®, backup2), (backup2, lift, final2) } |
| |
|
| |
|
|
| |
|
A3 looks again like the previous two parts of A. A3 however
assumes that u = v, which implies that the only useful transitions with
labels in S×B2 are the ones of the form
(p,(s,1,1),q) Î dM. So A3 first drops its pebble
at position u ( = v), then
(like A1) walks to the head of the string and simulates M until it gets
to the starting position (the pebble) again. It does not lift the pebble,
because it needs to know the position of u = v.
A3 treats the symbol s at this position as (s,1,1) in M.
Then it continues with the second stage of
the simulation of M until the end of the string. If it arrives there in
a final state of M, A3 backs up to the pebble which ends the
simulation of M.
Formally, A3 = (Q3, DSWPA(S), d3, IA, FA) is constructed as follows:
|
| |
|
|
|
{ in3, tobegin3, backup3, final3 } È |
| |
|
|
{ q3,i \mid q Î QM, 1 £ i £ 2 } |
| |
|
|
{ (in3, put, tobegin3),(tobegin3, ¬, tobegin3) } È |
| |
|
|
{ (tobegin3, head, q3,1) \mid q Î IM } È |
| |
|
|
{ (p3,1, (labs, ®), q3,1)\mid (p,s,q) Î dM } È |
| |
|
|
{ (p3,1, (here, labs, ®), q3,2) \mid(p,(s,1,1),q) Î dM } È |
| |
|
|
{ (p3,1, (here, labs, tail), backup3) \mid(p,(s,1,1),qfin) Î dM for some qfin Î FM } È |
| |
|
|
{ (p3,2, (labs, ®), q3,2)\mid (p,s,q) Î dM } È |
| |
|
|
{ (p3,2, (labs, tail), backup3)\mid (p,s,qfin) Î dM for some qfin Î FM} È |
| |
|
|
{ (backup3, ¬, backup3), (backup3, lift, final3) } |
| |
|
| |
|
|
| |
|
\square
Using this lemma the following theorem becomes trivial to prove.
Theorem 7
For each formula f(x,y) Î MSOL2(S), there exists a
string-walking pebble automaton A over S such that
R(f) = R(A).
1.6 Another proof
There is another way to prove Theorem 7.
This method makes use of Proposition 5 to simulate
the binary MSO formula by a string-walking automaton with MSO tests.
The MSO tests are then simulated by making use of a pebble.
Let S be an alphabet, and let
f(x,y) Î MSOL2(S) be an MSO formula
with two free position variables. Then there exists a
string-walking automaton with MSO tests A = (QA,DA,dA,IA,FA)
with DA a finite subset of DSWA+M(S)
such that R(f) = R(A) (Proposition 5).
We will construct a string-walking pebble automaton
A¢ = (Q¢,DSWPA(S),d¢,I¢,F¢) such that R(A¢) = R(A) = R(f).
We need the following result, which is Lemma 4 for the
case k = 1.
Lemma 8
Let S be an alphabet.
For every MSO test y(x) Î MSOL1(S), there
exists a finite automaton M over SÈ(S×B1) such that,
for all w Î S* and u Î Vw,
|
(w,u) \models y(x) iff mark(w,u) Î L(M) . |
|
We now construct A¢. In first approximation, A¢
equals A for transitions (p,d,q) with
d Î { ¬, ® }.
We define T = {(p,d,q) Î dA \mid d Î MSOL1(S)},
the collection of all transitions with MSO tests in A.
For each transition t = (p,y(x),q) Î T,
we use Lemma 8 to construct the
finite automaton Mt that calculates the MSO test y(x).
We will use this automaton to construct a ``subroutine'' in A¢
that simulates t.
The idea is as follows, similar to the construction of A3 in the proof of
Lemma 6.
A¢ first drops its pebble. Then it walks to the head of the string.
From there, it checks the MSO test by following Mt. Any transitions
in dMt
with labels not in S are handled by using the pebble.
The only possible form of such a transition is (p,(s,1),q).
In place of this transition, we add transitions to check whether
the position in the string where the automaton is now is marked,
i.e., the pebble is there. Then the symbol s is checked and
the automaton moves on to the next position in the string in
state q. If Mt reaches a final state at the end of the string,
A¢ moves back to the position where it dropped the pebble and
lifts it. At this moment the MSO test y(x) is successfully
verified by the automaton.
The following states are used for A¢.
First, all states from A, QA. Then we need extra states
for each MSO test. For all transitions t Î T
we use, next to the states of Mt, tobegint to move
to the beginning of the string and
backupt to back up to the pebble, the place where
the simulation of Mt started.
We assume that all the Mt for all t Î T have
unique states that do not overlap with either QA or states
of other Mt's.
Formally, A¢ = (Q¢,DSWPA(S),d¢,I¢,F¢) is constructed as follows:
|
| |
|
|
|
QA È{ tobegint, backupt \mid t Î T }È |
È
t Î T
|
QMt |
| |
|
| |
|
| |
|
| { (p,d,q) Î dA \midd Î {¬,®} } È |
È
| { dt \mid t Î T } |
|
| |
|
Here the transitions needed to simulate t = (p,y(x),q) are as follows:
|
| |
|
|
|
{ (p, put, tobegint),(tobegint, ¬, tobegint)} È |
| |
|
|
{ (tobegint, head, sin)\mid sin Î IMt } È |
| |
|
|
{ s,(Øtail, Øhere, labs,®),t) \mid (s,s,t) Î dMt } È |
| |
|
|
{ s,(Øtail,here,labs,®),t) \mid (s,(s,1),t) Î dMt } È |
| |
|
|
{ s,(tail,Øhere,labs),backupt) \mid (s,s,tfin) Î dMt for some tfin Î FMt } È |
| |
|
|
{ s,(tail, here,labs),backupt) \mid (s,(s,1),tfin) Î dMt for some tfin Î FMt } È |
| |
|
| { (backupt, (Øhere, ¬), backupt),(backupt, lift, q) } |
|
| |
|
1.7 Pebble Necessity
The following theorem states that we really need the pebble in
Theorem 7. The proof of this theorem is a simpler form of the
proof for tree-walking automata (Theorem 15 in [BE97]).
It is included here for completeness.
Theorem 9
There exist an alphabet S and a
formula f(x,y) Î MSOL2(S)
such that there is no string-walking automaton A
with R(A) = R(f).
Proof.
We construct an MSO formula f(x,y) Î MSOL2(S)
with S = {b,r}, denoting black and red. For a string w over
S, the binary relation R(f) connects certain positions of w.
If w has exactly
one position with symbol r, say vred, then R(f) will connect
any position to vred. Otherwise, i.e., if there is no red symbol
or more than one, R(f) will connect any position to the next in left-to-right
circular order.
We use the following
abbreviations to define f(x,y):
|
| |
|
|
|
$y ( labr(y) Ù"z(labr(z) ® z = y)) |
| |
|
|
pre(x,y) Ú( tail(x) Ùhead(y) ) |
| |
|
| ( orsÙlabr(y) ) Ú( ØorsÙsucc(x,y) ) |
|
| |
|
Here ors is a closed MSO formula which is true if there is exactly
one red symbol in w, and succ(x,y) is an MSO formula which is true if y
follows x in left-to-right circular order. Clearly, f(x,y) is a
binary MSO formula with R(f) as indicated above.
Now we will show that there is no string-walking automaton
A = (Q,DSWA(S),d,I,F) with R(A) = R(f).
Suppose that such an A exists. We may assume that
I = {qin}. The idea is that when A starts at any position
u of a string with only b's, it first has to visit all positionsc
of the string to check there is no r in the string. Because there
is no way for A to remember its starting point u, A also
cannot find the position after u in left-to-right circular order.
Let w be b#Q+1, a string of #Q+1 black symbols.
Let w¢ be rb#Q. Thus, w¢ is w with the first symbol changed to red.
We will use the following function:
which gives the successor to position k in left-to-right circular order.
Because w does not contain a symbol r, there exists
for every k Î [1,#Q+1] a state fk Î F such that
|
(qin,k) \twoheadrightarrowA,w* (fk, succ(k)) . |
|
On the other hand, because w¢ contains exactly one symbol r,
for all k Î [1,#Q], there is no state f Î F such that
|
(qin,k) \twoheadrightarrowA,w¢* (f, succ(k)) . |
|
This difference implies that for all k Î [1, #Q] the walk
of A on w from k to succ(k) must visit position 1, because the
only different symbol is the first, which is black for w and red for w¢.
Because the walk of A on w from #Q+1 to 1 also visits position 1, there
is for all k Î [1,#Q+1] a qk Î Q such that
|
(qin,k) \twoheadrightarrowA,w* (qk,1) \twoheadrightarrowA,w* (fk,succ(k)) . |
|
There are #Q+1 possibilities for k and only #Q states, which
means there are k,k¢ Î [1,#Q+1] with k ¹ k¢ and qk = qk¢.
Then
|
(qin,k) \twoheadrightarrowA,w* (qk,1) = (qk¢,1) \twoheadrightarrowA,w* (fk¢,succ(k¢)) . |
|
This implies that A walks from k both to
succ(k) and to succ(k¢) with k ¹ k¢, a contradiction to the fact
that R(A) = R(f). \square
1.8 From
String-Walking Pebble Automata to
MSO Formulas
In this section we will prove that a string-walking pebble automaton
can be simulated by a string-walking automaton with MSO tests. Because
it is already known that these automata can be simulated by binary MSO
formulas (Proposition 5), and because we have shown in previous sections that a binary MSO
formula can be simulated by a string-walking pebble automaton, we
obtain the equivalence of binary MSO formulas and string-walking pebble automata.
To prove that it is possible to simulate the use of
a pebble with MSO tests, we will need the following lemma.
In the lemma, A is a string-walking pebble automaton that is not allowed to
move its pebble, but it is allowed to check the presence of the pebble at the
current position. The lemma states that the round-trip of A from the position
where the pebble is, back to that position can be simulated by a unary MSO
formula.
Lemma 10
Let A be a string-walking pebble automaton, with no transitions with directives
put or lift,
over S. Then,
for every pair of states p,q Î QA, there exists a formula
ypq(x) Î MSOL1(S) such that, for all w Î S*
and x Î Vw,
|
(p,x,x) \twoheadrightarrowA,w* (q,x,x) iff (w,x) \models ypq(x) . |
|
Proof.
Let A be a string-walking pebble automaton
over S without put and lift. Let p,q Î QA.
First note that the pebble does not move during A's walk. We can therefore
consider the pebble position a constant and mark it using a special symbol
(s,1) where s is the original symbol at the pebble position.
We will now define a string-walking automaton A¢ (without pebble) over
SÈ(S×B1) that simulates the walk of A from the pebble position and
back. We define A¢ = (QA, DSWA(SÈ(S×B1)), d¢, {p}, {q}).
Thus, the states of A¢ are the
same as those of A, A¢'s initial state is p, and its final state is q.
Furthermore, d¢ consists of the following transitions:
|
| |
|
|
| |
|
|
{ (s,lab(s,1),t) \mid (s,here,t) Î dA, s Î S} È |
| |
|
|
{ (s,labs,t) \mid (s,Øhere,t) Î dA, s Î S} È |
| |
|
| { (s,lab(s,1),t) \mid(s,labs,t) Î dA, s Î S}. |
|
| |
|
The directive ``here'' is simulated by checking whether the symbol at the current position
is in S×B1. Similarly, ``Øhere'' is simulated by checking
whether the symbol at the current position is in S. Otherwise, the
new symbols from S×B1 are treated just like the corresponding
symbols from S.
Now, for all s,t Î QA, w Î S* and x,y,z Î Vw,
it is easy to see that
|
(s,x,z) \twoheadrightarrowA,w* (t,y,z) iff (s,x) \twoheadrightarrowA¢,w¢* (t,y) |
|
where w¢ = mark(w,z). This means that A¢ on a string marked at
position z walking from x to y simulates A on the unmarked string
with the pebble at position z. Both automata start in state s and end in
state t.
Also, from the definition of Rw¢(A¢),
|
(p,x) \twoheadrightarrowA¢,w¢* (q,y) iff (x,y) Î Rw¢(A¢) . |
|
According to Proposition 5
we can construct a formula f¢pq(x,y) Î MSOL2(SÈ(S×B1))
that simulates the walk of A¢ on a marked string, so that, for all
w Î S*, x,y,z Î Vw and w¢ = mark(w,z),
|
(x,y) Î Rw¢(A¢) iff (w¢,x,y) \models f¢pq(x,y) . |
|
Using Lemma 2 yields a formula
fpq(x,y,z) Î MSOL3(S) such that,
for all w Î S* and x,y,z Î Vw,
|
(mark(w,z),x,y) \models f¢pq(x,y) iff (w,x,y,z) \models fpq(x,y,z) . |
|
We now define ypq(x) = fpq(x,x,x) Î MSOL1(S). Then,
for all w Î S* and u Î Vw,
|
(p,x,x) \twoheadrightarrowA,w* (q,x,x) iff (w,x) \models ypq(x) . |
|
\square
Theorem 11
For every string-walking pebble automaton A
there exists a string-walking automaton with MSO tests A¢ such that
R(A) = R(A¢).
Proof.
When A drops its pebble at some position u of string w,
it must return there some
later time to pick it up, because of the definition of Rw(A).
In the mean time, i.e., until A returns to u, A does not move its
pebble. By using
Lemma 10, the actions of this automaton until its arrival in u
can be simulated by a single MSO test. The details are as follows.
Let A = (Q,DSWPA(S),d,I,F) be a string-walking pebble automaton
over S. We will define a string-walking automaton with MSO tests A¢.
The MSO tests ypq are obtained by applying Lemma 10
to the automaton derived from A by removing all transitions of the
form (p,d,q) with d Î {put,lift}.
We define A¢ = (Q,D,d¢,I,F) with
|
| |
|
|
|
{ ¬, ®, true(x), head(x), tail(x), Øhead(x), Øtail(x) } È |
| |
|
|
{ labs(x) \mid s Î S} È{ ypq(x) \mid p,q Î Q } |
| |
|
|
{ (p,d,q) Î d\mid d Î { ¬, ® } } È |
| |
|
|
{ (p,head(x),q) \mid (p,head,q) Î d} È |
| |
|
|
{ (p,Øhead(x),q) \mid (p,Øhead,q) Î d} È |
| |
|
|
{ (p,tail(x),q) \mid (p,tail,q) Î d} È |
| |
|
|
{ (p,Øtail(x),q) \mid (p,Øtail,q) Î d} È |
| |
|
|
{ (p,labs(x),q) \mid (p,labs,q) Î d} È |
| |
|
|
{ (p,true(x),q) \mid (p,Øhere,q) Î d} È |
| |
|
|
{ (p,yqr(x),s) \mid (p,put,q) Î d and (r,lift,s) Î d} |
| |
| |
|
The directives ¬, ®, head, Øhead, tail, Øtail
and labs (for s Î S) are simulated straightforwardly.
The MSO test yqr(x) simulates the actions of A between the put and the lift.
\square
This theorem leads us to the grand finale of this chapter.
Theorem 12
The following two statements hold:
- For every MSO formula f(x,y) Î MSOL2(S) there exists
a string-walking pebble automaton A over S such that
R(A) = R(f).
- For every string-walking pebble automaton A over S
there exists an MSO formula f(x,y) Î MSOL2(S) such that
R(f) = R(A).
There is another way to prove that any string-walking pebble automaton
can be simulated by an MSO formula with two position variables.
Let A be a string-walking pebble automaton
over S that recognizes the relation R(A). Then a
string-walking pebble automaton A¢ over SÈ(S×B2) can easily be constructed
such that (w,u,v) Î R(A) just if mark(w,u,v) Î L(A¢). A¢ walks
right until it encounters a symbol (s,1,x) with x Î {0,1}. Then
A¢ starts simulating A directly until it encounters a symbol
(s,y,1) with y Î {0,1} in a final state.
Because SWPA=REG [BH65] and REG=MSO (Proposition 1),
there exists an MSO formula
y Î MSOL0(SÈ(S×B2)) such that L(A¢) = L(y). Using
Corollary 3, we obtain an MSO formula
f(x,y) Î MSOL2(S) such that, for all w Î S*,
mark(w,u,v) Î L(y) just if (w,u,v) \models f(x,y).
Note that we can also use Theorem 12 to derive
that SWPAREG, as follows:
Theorem 13
SWPAREG
Proof.
First we will show that any language defined by a string-walking pebble automaton
is MSO definable and hence regular.
Let A be a string-walking pebble automaton over S.
By definition, the language
recognized by A is
|
L(A) = { w Î S* \mid there exists v Î Vw such that (1,v) Î Rw(A) }. |
|
Using the second part of Theorem 12 we obtain a formula
f(x,y) Î MSOL2(S) such that, for all w Î S* and
u,v Î Vw,
|
(u,v) Î Rw(A) iff (w,u,v) \models f(x,y) . |
|
We now define
|
y = "x ( $y (head(x) ® f(x,y) ) ) |
|
to obtain that, for all w Î S* and v Î Vw,
|
$v Î Vw ((1,v) Î Rw(A)) iff $v Î Vw ((w,1,v) \models f(x,y)) iff w \models y |
|
so
and, since MSO=REG, L(A) is a regular language.
The other way around, we show that any regular language can be described by
a string-walking pebble automaton.
Let L be a regular language over an alphabet S. Let
M = (Q,S,d,I,F) be a finite automaton such that L = L(M). We define
a string-walking pebble automaton A = (Q¢,DSWPA(S),d¢,I,F¢) with
|
| |
|
|
| |
|
|
{ (p,(labs,®),q) \mid (p,s,q) Î d} È |
| |
|
|
{ (p,(labs,tail),qfin) \mid (p,s,q) Î d for some q Î F } |
| |
|
|
| |
|
It is obvious to see that L(A) = L(M) = L.
\square
Chapter 2
Extensions to
String-Walking Pebble Automata
In this chapter we will introduce two extensions to the concept of
string-walking pebble automata. These extensions will be constructed in such
a way that the class of languages recognized by the automata is the same
as the class of languages that the (ordinary) string-walking automata
recognize, i.e., the class of regular languages.
2.1 String-Walking Multi-Pebble Automata
The first extension to the string-walking automaton is the string-walking
multi-pebble automaton. This automaton can use more than one pebble. The
number of pebbles available to the automaton is fixed.
The pebbles will be taken from a finite set and
must be used ``nested''. This means that, at any point in
the automaton's walk, only the pebble that was dropped last can be lifted.
It can easily be seen that we can, without changing the automaton's power,
number the pebbles {1,2,¼,n} and
demand that the pebbles must be used in order. In order to see this, suppose
the pebbles are taken from a set P with |P| = n.
Consider the set M of bijections from
P to {1,2,¼,n} (there are n! such mappings). We can now simulate an
automaton with pebble set P by an automaton with pebble set [1,n] by
coding the pebble mapping into the states: for every state q we use
additional states {qm \mid m Î M}. We also modify the transitions with
put and lift so that the information on the mapping is maintained.
Note that the constraint of the nested use of pebbles is necessary to make
sure that the automaton does not recognize non-regular languages. A string-walking
2-pebble automaton without this limitation could recognize the language
L = {ancbn \mid n Î \mathbb N} by putting the pebbles at both ends of the
string and moving them towards the middle of the string in turn, checking
that both pebbles arrive at the middle c after the same number of moves.
This kind of use of the pebbles is prohibited by demanding that the pebbles
be used nested. In broader terms, it can be shown that unnested multi-pebble
automata are equivalent to two-way multi-head automata, which in turn are
equivalent to logarithmic space Turing machines.
Let n ³ 1. We use the following set of directives for a string-walking n-pebble automaton
over S:
|
| |
|
|
| DSWA(S)È{ puti, lifti, herei, Øherei \mid 1 £ i £ n } |
|
| |
|
For
each w Î S* the relations for the new directives are as
follows.
The relations are on tuples (u,k,p) Î Vw ×[0,n]×([1,n] ® Pw). Here k represents that pebbles 1,¼,k
are currently on the string and p maps the pebbles to string positions. We
define p0 : [1,n] ® Pw with p0(i) = ^ for all i Î [1,n]
as the mapping with no pebbles on the string.
The relations hold for any
u Î Vw, k Î [0,n], and pebble
placement function p:[1,n] ® Pw.
|
| |
|
|
|
{ ((u, i-1, p), (u, i, p(i ® u))) \mid 1 £ i £ n } |
| |
|
|
|
|