Tree-Walking Automata and Monadic Second Order Logic

Jan-Pascal van Best

Preface

This report describes the results of a study in the field of theoretical computer science. It was written as my ``doctoraalscriptie'' (Master's thesis) for my study of Computer Science at Leiden University. The graduation tutor from the Department of Computer Science was J. Engelfriet, PhD.

This report was written using the LATEX typesetting software on a computer running Linux and the KDE desktop environment.

I would like to thank Joost Engelfriet for his constant advice and his help in conducting this study. My parents have always supported me in many ways. And, last but certainly not least, thanks to Ellen for the title page she designed and for her love and support.

Jan-Pascal

Delft, July 1998

Contents

Preface
Introduction
1  Binary MSO Formulas and String-Walking Pebble Automata
    1.1  Preliminaries
    1.2  String-Walking Automata
    1.3  String-Walking Pebble Automata
    1.4  String-Walking Automata with MSO Tests
    1.5  From MSO Formulas to String-Walking Pebble Automata
    1.6  Another proof
    1.7  Pebble Necessity
    1.8  From String-Walking Pebble Automata to MSO Formulas
2  Extensions to String-Walking Pebble Automata
    2.1  String-Walking Multi-Pebble Automata
    2.2  From String-Walking Multi-Pebble Automata to MSO Formulas
    2.3  String-Walking Marble Automata
3  Binary MSO Formulas and Tree-Walking Automata
    3.1  Preliminaries
    3.2  Tree-Walking Automata
    3.3  Push-Down Tree-Walking Automata
    3.4  Equivalence of Finite Tree Automata and Push-Down Tree-Walking Automata
    3.5  Tree-Walking Marble Automata
    3.6  Tree-Walking Marble/Pebble Automata
    3.7  Tree-Walking Automata with MSO Tests
    3.8  Equivalence of Binary MSO Formulas and Tree-Walking Marble/Pebble Automata
Conclusion and Recommendations

Introduction

Monadic second order logic (MSO logic, [Büc60,,TW68]) is a way to describe properties of graphs. In this paper we will focus on strings and trees as special cases. Monadic second order logic combines great strength with great ease of expression. Closed MSO formulas define sets of trees, while MSO formulas with one or more unbound variables define node relations on trees. One perhaps less desirable property of MSO logic is that its formulas are global in a way that they define properties of trees as a whole. Another way to define node relations is by tree-walking automata. A tree-walking automaton can compute a binary node relation by starting on one node of the tree and finishing on another node. It has already been shown that tree-walking automata that can test unary MSO formulas at each node they visit, compute the same node relations that binary MSO formulas define [BE97]. However, these unary MSO formulas define in a way also global properties of the tree, in that they can test, e.g., the labels of several nodes of the tree they walk on in one formula. It can also be shown that ordinary tree-walking automata do not compute the same node relations that binary MSO formulas recognize. Therefore, in order to compute the node relations defined by binary MSO formulas, we need to extend tree-walking automata with features that make them more powerful. We need to make sure that these features result in only local operations, i.e., the resulting automata can, during their walk on a tree, only test properties of the current node. Also, we would like these operations to be more operational and less descriptive than unary MSO formulas. These considerations lead us to the central question of this paper.

Is it possible to define a type of tree-walking automaton, with only local operations, that computes the same binary node relations that binary MSO formulas recognize?

In this paper, we try to find an answer to this question. We start by attempting to find an answer for strings in Chapter 1. In this chapter, we introduce the concept of pebbles . During the walk of a string- or tree-walking automaton, the automaton can place a pebble on the current node. Later during its walk, the automaton can check for the presence of a pebble and pick it up. Variations are possible in the allowed number of pebbles, coloured pebbles and restriction in the use of pebbles. It is shown that string-walking pebble automata (with just one pebble) compute exactly the binary relations that binary MSO formulas define on strings. In the next chapter, we define some more powerful string-walking automata and show some of their properties. In Chapter 3, we move on to trees. The task of finding a proper extension for tree-walking automata proves to be a bit harder than for string-walking automata. Because it is already known [KS81] that push-down tree-walking automata recognize exactly the regular tree languages (which are the tree languages defined by closed MSO formulas), we develop tree-walking marble automata, similar in concept to push-down tree-walking automata, that also recognize exactly the regular tree languages but, unlike push-down tree-walking automata, allow for a natural definition of the computation of node relations. By adding one pebble, we obtain tree-walking marble/pebble automata. We show that these compute exactly the binary node relations that binary MSO formulas define on trees. In the last chapter, we conclude this paper and give some recommendations for future research.

Chapter 1
Binary MSO Formulas and String-Walking Pebble Automata

1.1  Preliminaries

In this section we recall some well-known concepts from formal language theory [HU79,], more specifically concerning strings, finite automata and monadic second order logic (on strings). Before doing so, we recall some standard terminology from set theory.

The set of natural numbers is \mathbb N = {0,1,2,¼} and, for m,n Î \mathbb N, [m,n] = { i Î \mathbb N \mid m £ i £ n}. For a set S, 2S is its powerset, i.e., the set of all subsets of S. Let R Í A ×B be a binary relation. The transitive reflexive closure of R is denoted R*. The inverse of R is R-1 = { (y,x) \mid (x,y) Î R }. For two binary relations R1 and R2, their composition is R1 °R2 = {(x,z) \mid $y: (x,y) Î R1 and (y,z) Î R2 }. Note that the order of R1 and R2 is nonstandard. For each a Î A, R(a) = { b Î B \mid (a,b) Î R } and for each S Í A, R(S) = È{ R(a) \mid a Î S }. The domain of R is dom(R) = R-1(B). A binary relation R is functional if it is a partial function, i.e., (x,y),(x,z) Î R implies x = z.

Let f:A® B be a function; for any a and b, f(a® b) denotes the ``perturbed'' function f¢:A È{a} ® B È{b} with f¢(a) = b and f¢(a¢) = f(a¢) for every a¢ Î A with a¢ ¹ a.

Strings

An alphabet is a finite set of symbols. Let S be an alphabet. A string over S is a finite string of symbols from S. The empty string is denoted l. The set of all strings over S is denoted S*. A subset of S* is called a language. If w is a string over S, |w| denotes the length of w. The set of positions in w is denoted by Vw = [1,|w|]. For a string w Î S* and i Î Vw, we write labw(i) = s if the ith symbol of w is s. For k Î \mathbb N, a k-ary position relation over S is a subset of {(w,u1,¼,uk) \mid w Î S* and ui Î Vw for all i Î [1,k] }. A k-ary position relation associates with each string w a k-ary relation on the positions of w.

We define marked strings as follows. Let S be an alphabet, and k ³ 1. We define Bk = {0,1}k \{0}k. The alphabet SÈ(S×Bk) contains all symbols s Î S and the symbols (s, b1, ¼, bk) with bi Î {0,1} for all i Î [1,k], but without (s, 0, ¼, 0). The role of the latter symbol is played by s itself. This alphabet is used to attach k different marks to the positions in a string. Let w Î S* be a string over S, and let u1, ¼, uk Î Vw. The marked string w¢ = mark(w, u1, ¼, uk) Î (SÈ(S×Bk))* is the string with labw¢(u) = labw(u) if u ¹ ui for all i Î [1,k], and labw¢(u) = (labw(u), (u = u1), ¼, (u = uk)) otherwise (where (u = ui) = 1 iff u equals ui).

Finite Automata

Let S be an alphabet. A finite automaton over S is a quintuple M = (Q,S,d,I,F), where Q is a finite set of states, S is the input alphabet, d Í Q ×S×Q is the transition relation, I Í Q is the set of initial states and F Í Q is the set of final states. The elements of d are called transitions. For every string w Î S*, the state transition relation RM(w) Í Q ×Q is defined as follows. For s Î S, RM(s) = { (p,q) \mid (p,s,q) Î d}. For s1,¼,sn Î S, RM(s1¼sn) = RM(s1)°¼°RM(sn). The language recognized by M is L(M) = { w Î S* \mid (qin,qfin) Î RM(w) for some qin Î I, qfin Î F }. A language is regular if it is recognized by a finite automaton. The class of all regular languages is named REG.

Monadic Second Order Logic (on strings)

Monadic second order logic can be used to describe properties of strings [Büc60]. For an alphabet S, we define the language MSOL(S) of MSO formulas over S. This language has position variables x,y,¼ and position-set variables X,Y,¼. For a given string w Î S*, position variables range over Vw = [1,|w|] and position-set variables range over the subsets of Vw.

There are three types of atomic formulas in MSOL(S): labs(x), for every s Î S, signifies that at position x there is a symbol s. When MSO logic is expanded to describe properties of trees and graphs, labs(x) means that node x is labeled s, hence the name labs. The atomic formula pre(x,y) signifies that position x comes directly before y, i.e. y = x+1; and x Î X signifies that x is an element of X. The connectives used to build formulas from these atomic formulas are Ø, Ù, Ú, ®, «, with the usual meaning. Both position variables and position-set variables can be quantified with $ and ". We will make use of the following abbreviations of MSO formulas:

head(x)
=
"y(Øpre(y,x))
tail(x)
=
"y(Øpre(x,y))
true(x)
=
$X(x Î X)
false(x)
=
Øtrue(x)

If an MSO formula f has free variables, say x, y, X and no others, we write f(x,y,X) to indicate the free variables of f. For every k Î \mathbb N, the set of MSO formulas over S with k free position variables and no free position-set variables is denoted MSOLk(S), or the set of k-ary MSO formulas. For a closed formula f Î MSOL0(S) and a string w Î S*, we write w \models f if w satisfies f. The language defined by f is L(f) = { w Î S* \mid w \models f}. L(f) is called an MSO definable language . The class of all MSO definable languages is named MSO.

Given a string w, a valuation function n is a function that maps each position variable to a position u Î Vw and each position-set variable to a subset U of Vw. Let f be an MSO formula. We write (w,n) \models f, if f holds in w, where the free variables of f are assigned values according to the valuation function n. If f has free variables x,y,X, we write (w,u,v,U) \models f(x,y,X) for (w,n) \models f, where n(x) = u, n(y) = v and n(X) = U.

Let f(x1, ¼, xk) Î MSOLk(S) be an MSO formula with k free positions variables (and no free position-set variables). For each string w Î S*, the k-ary relation that f defines on the positions of w is

Rw(f) = { (u1, ¼, uk) Î Vwk \mid (w, u1, ¼, uk) \models f(x1, ¼, xk) } .
The MSO formula f(x1, ¼, xk) defines the position relation
R(f) = { (w, u1, ¼, uk) \mid w Î S*, (u1, ¼, uk) Î Rw(f) } .

Büchi [Büc60] proved the following classical result.

Proposition 1 A language is MSO definable if and only if it is regular.

In [BE97], the following (well-known) results are proven for MSO formulas on trees. They also hold for the special case of strings. Corollary 3 is the result of Lemma 2 in the case j = k.

Lemma 2 Let S be an alphabet, k ³ 1, and j Î [1,k]. For every formula f(x1, ¼, xk) Î MSOLk(S) there is a formula y(xj+1, ¼, xk) Î MSOLk-j(SÈ(S×Bj)) such that, for all w Î S* and u1, ¼, uk Î Vw,

(w, u1, ¼, uk) \models f(x1, ¼, xk) iff (mark(w, u1, ¼, uj), uj+1, ¼, uk) \modelsy(xj+1, ¼, xk),
and vice versa, i.e., for every formula y(xj+1, ¼, xk) Î MSOLk-j(SÈ(S×Bj)) there is a formula f(x1, ¼, xk) Î MSOLk(S) such that the above equivalence holds for all w Î S* and u1, ¼, uk Î Vw.

Corollary 3 Let S be an alphabet and k ³ 1. For every formula f(x1, ¼, xk) Î MSOLk(S) there is a formula y Î MSOL0(SÈ(S×Bk)) such that, for all w Î S* and u1, ¼, uk Î Vw,

(w, u1, ¼, uk) \models f(x1, ¼, xk) iff mark(w, u1, ¼, uk) \models y,
and vice versa, i.e., for every MSO formula y Î MSOL0(SÈ(S×Bk)) there is a formula f(x1, ¼, xk) Î MSOLk(S) such that the above equivalence holds for all w Î S* and u1, ¼, uk Î Vw.

From Corollary 3 and Proposition 1 follows the following lemma.

Lemma 4 Let S be an alphabet and let k ³ 1. For every MSO formula f(x1, ¼, xk) Î MSOLk(S) there exists a finite automaton M over SÈ(S×Bk) such that, for all w Î S* and u1, ¼, uk Î Vw,

(w, u1, ¼, uk) \models f(x1, ¼, xk) iff mark(w, u1, ¼, uk) Î L(M) .

1.2  String-Walking Automata

String-walking automata are known in the literature as two-way finite automata (see [HU79]). We will call them string-walking automata to stress both the differences and the similarities with tree-walking automata. String-walking automata are like the well-known finite automata, except that they can walk both forwards and backwards along the string they are examining (hence the name two-way finite automata). At any moment in the execution of a string-walking automaton, the automaton is in one state and at one position of the string. Let S be an alphabet. Syntactically, a string-walking automaton over S is a finite automaton with a special set of directives as input alphabet. We define the set of directives as
DSWA(S) = { ¬, ®, head, Øhead, tail, Øtail} È{ labs\mid s Î S} .
The meaning of the directives is as follows:

Formally, we define for each string w Î S* and each directive d Î DSWA(S) the following binary relation Rw(d) on Vw:

Rw(¬)
=
{ (i, i-1) \mid 1 < i £ |w| }
Rw(®)
=
{ (i, i+1) \mid 1 £ i < |w| }
Rw(labs)
=
{ (i, i) \mid 1 £ i £ |w| and the ith symbol of w is s }
Rw(head)
=
{ (1, 1) }
Rw(Øhead)
=
{ (i, i) \mid 1 < i £ |w| }
Rw(tail)
=
{ (n, n) \mid n = |w| }
Rw(Øtail)
=
{ (i, i) \mid1 £ i < |w| }
A string-walking automaton over S is a finite automaton A over DSWA(S). For a string-walking automaton A = (Q,DSWA(S), d, I, F) and a string w Î S*, an element (q,u) of Q ×Vw is a configuration of the automaton. The configuration (q,u) Î Q ×Vw signifies that A is in state q at position u. We say that the current position of the automaton on w is u. The set of all configurations of A and w is denoted by \mathbb CA,w.

Let w Î S*. One step of A = (Q,DSWA(S),d,I,F) on w is defined by the binary relation \twoheadrightarrowA,w on the set of configurations, as follows. For every (q,u), (q¢,u¢) Î \mathbb CA,w,

(q,u) \twoheadrightarrowA,w (q¢,u¢) iff $d Î DSWA(S): (q,d,q¢) Î d and (u,u¢) Î Rw(d) .
For each string w Î S*, A computes the binary relation
Rw(A) = { (u,v) Î Vw ×Vw \mid (q,u) \twoheadrightarrowA,w*(q¢,v) for some q Î I and q¢ Î F } .
Thus, A computes the position relation
R(A) = { (w,u,v) \mid w Î S*, (u,v) Î Rw(A) } .
We define the language that A recognizes as all strings w with the property that there is a walk of A on w starting at the beginning of the string in an initial state and ending in a final state, or
L(A) = { w Î S* \mid there exists v Î Vw such that (1,v) Î Rw(A)} .
The class of all languages recognized by a string-walking automaton is named SWA.

In the following we will also allow transitions of the form (p,s,q) with s Î DSWA(S)*. If s = d1 d2 ¼dn, we define, for all w Î S*,

Rw(s) = Rw(d1) °¼°Rw(dn).
These extended directives can easily be implemented by adding extra transitions and states.

A string-walking automaton A = (Q,DSWA(S),d,I,F) over S is deterministic if the following three conditions hold:

  1. #I = 1.
  2. If (p,d,q) Î d, then p \not Î F.
  3. For all distinct transitions (p,d1,q1), (p,d2,q2) Î d, d1 and d2 are two mutually exclusive directives in DSWA(S).

Two directives d1,d2 Î DSWA(S) are mutually exclusive exactly if, for all w Î S* and u Î Vw, dom(Rw(d1)) Çdom(Rw(d2)) = Æ. The following pairs of directives in DSWA(S) are mutually exclusive:

1.3  String-Walking Pebble Automata

We will extend the concept of string-walking automata with a pebble [BH65]. A string-walking pebble automaton is able to drop its pebble at its current position in the string, pick it up at any (later) time, if it is at the same position again, and check whether or not the pebble is present at the automaton's current position. Let S be an alphabet. The directives for a string-walking pebble automaton are
DSWPA(S)
=
DSWA(S)È{ put, lift, here, Øhere}
=
{ ¬, ®, head, Øhead, tail, Øtail, put, lift, here, Øhere} È{ labs\mid s Î S}
The meaning of the new directives is as follows:

For all strings w Î S*, we will denote the pebble positions with elements of Pw = Vw È{^}. Here ^ denotes that the pebble is not placed at any position of the string, i.e., the automaton ``carries'' the pebble. We now need to extend the binary relations Rw(d) (for d Î DSWA(S)) to be on pairs (u,p) Î Vw ×Pw to take pebble-handling into account. Let, for all strings w Î S* and directives d Î DSWA(S), [R\tilde]w(d) denote the binary relation Rw(d) as defined for string-walking automata in the previous section. We now define Rw(d) for string-walking pebble automata as follows:

Rw(d) = { ((i,p),(i¢,p)) \mid (i,i¢) Î ~
R
 

w 
(d) and p Î Pw } .
This means that the directives from DSWA(S) do not alter the pebble position.

The relations for the new directives are as follows.

Rw(put)
=
{ ((i,^), (i, i)) \mid 1 £ i £ |w|}
Rw(lift)
=
{ ((i, i), (i, ^)) \mid 1 £ i £ |w|}
Rw(here)
=
{ ((i, i), (i, i)) \mid 1 £ i £ |w|}
Rw(Øhere)
=
{ ((i,p), (i,p)) \mid1 £ i £ |w| and p ¹ i }

Let S be an alphabet. A string-walking pebble automaton is a finite automaton A over DSWPA(S). For a string-walking pebble automaton A = (Q,DSWPA(S), d, I, F) and a string w Î S*, an element (q,u,p) Î Q ×Vw ×Pw is a configuration of the automaton. The set of all configurations of A and w is denoted by \mathbb CA,w. A configuration (q,u,p) signifies that A is in state q at position u, while the pebble is at position p.

The step relation \twoheadrightarrowA,w is extended in the obvious way. For a string-walking pebble automaton A, a string w Î S* and configurations (q,u,p),(q¢,u¢,p¢) Î \mathbb CA,w,

(q,u,p) \twoheadrightarrowA,w* (q¢,u¢,p¢) iff $d Î DSWPA(S): (q,d,q¢) Î d and ( (u,p),(u¢,p¢) ) Î Rw(d).

For the relation that A computes, we now demand that A begins and ends with the pebble not on the string, or ``in its pocket''. For w Î S*,

Rw(A) = { (u,v) Î Vw ×Vw \mid (q,u,^) \twoheadrightarrowA,w*(q¢,v,^) for some q Î I and q¢ Î F }.
The definitions of R(A) and L(A) remain the same. The class of all languages recognized by a string-walking pebble automaton is named SWPA.

Transitions of the form (p,s,q) with s Î DSWPA(S)* are treated as with string-walking automata. The definition of a deterministic string-walking pebble automaton is similar to that of a deterministic string-walking automaton. The only difference is that the directives are now taken from DSWPA(S). The following pairs of directives in DSWPA(S) are mutually exclusive:

1.4  String-Walking Automata with MSO Tests

Another way to extend string-walking automata (as done in [BE97] for tree-walking automata, see also Section 3.7) is to allow the automaton to test any MSO definable property of the current position of the string, using MSO formulas with one free position variable. This extension also requires the introduction of new directives. Let S be an alphabet. The set of directives over S is
DSWA+M(S) = { ¬, ®} ÈMSOL1(S) .
Note that head, tail, labs, etc. are MSO definable properties so they need not be explicitely stated in the definition of DSWA+M(S). Note also that DSWA+M(S) is an infinite set. Therefore we need to define for each string-walking automaton with MSO tests a finite subset of DSWA+M(S) as the directives used by the automaton. For the directive d = y(x) with y Î MSOL1(S), we define the following binary relation on Vw (w Î S*):
Rw(y(x)) = {(i,i) \mid (w,i) \models y(x) } .
A string-walking automaton with MSO tests over S is a finite automaton A over a finite subset of DSWA+M(S). The definitions of configurations, \twoheadrightarrowA,w, Rw(A), etc. are unchanged with respect to the definitions for (ordinary) string-walking automata. The definitions of deterministic string-walking automata and mutually exclusive directives are also unchanged, although it is no longer possible to list all mutually exclusive pairs of directives. Note that it is decidable whether a given pair of directives in MSOL1(S) is mutually exclusive.

The class of all languages recognized by a string-walking automaton with MSO tests is named SWA+M.

In [BE97] the following results are proven for trees. They also hold for the special case of strings.

Proposition 5 The following three statements hold:

1.5  From MSO Formulas to
String-Walking Pebble Automata

Both MSO formulas with two position variables and string-walking pebble automata compute binary position relations. In this section, we show that every binary position relation that can be defined by an MSO formula, can also be computed by a string-walking pebble automaton. In Section 1.8 we will show that the reverse is also true, i.e., every binary position relation that can be computed by a string-walking pebble automaton can also be defined by a binary MSO formula.

Let S be an alphabet and let f(x,y) be a binary MSO formula over S. We have seen (Lemma 4) that there exists a finite automaton M over SÈ(S×B2) such that, for all w Î S* and u,v Î Vw,

(w,u,v) \models f(x,y) iff mark(w,u,v) Î L(M) .
The following lemma states that there is a string-walking pebble automaton that computes the same relation as f(x,y) defines.

Lemma 6 For every finite automaton M over SÈ(S×B2) there exists a string-walking pebble automaton A over S such that, for all w Î S* and u,v Î Vw,

mark(w,u,v) Î L(M) iff (u,v) Î Rw(A).

Proof. Let M = (QM,SÈ(S×B2), dM, IM, FM) be a finite automaton over SÈ(S×B2). We will define a string-walking pebble automaton over S, A, that simulates M, using the pebble to simulate the marked positions in M. A is defined as the (intuitive) union of three automata A1, A2, A3. We define A as (Q1 ÈQ2 ÈQ3, DSWPA(S), d1 Èd2 Èd3, I1 ÈI2 ÈI3, F1 ÈF2 ÈF3). A1 deals with the case that u comes ``before'' v in w, A2 deals with the case that v comes first and A3 deals with the case u = v. For 1 £ i £ 3, Ai = (Qi, DSWPA(S), di, Ii, Fi).

A1 simulates M, assuming that u comes before v in w. A1 first drops its pebble at the start position u. This makes it possible to determine later where the automaton started. Then it walks to the head of the string. Next it simulates M in three stages. In the first stage of the simulation, it simulates M (only taking into account transitions of the form (p,s,q) with s Î S) until it finds the pebble. It picks up the pebble and treats the symbol s on this position as (s,1,0) in M. The automaton now continues simulating M (the second stage of the simulation), again without transitions with labels in S×B2. Then, nondeterministically, A1 drops the pebble at some position v and treats the symbol s at this position as (s,0,1) in M. A continues simulating M (the third stage) until it reaches the end of the string. If it reaches the end of the string in a final state of M, the choice for v was a good choice. The automaton then backs up until it finds the pebble at position v. This ends A1.

We use the following states in A1.

Formally, A1 = (Q1, DSWPA(S), d1, I1, F1) is constructed as follows:

Q1
=
{ in1, tobegin1, backup1, final1 } È
{ q1,i \mid q Î QM, 1 £ i £ 3 }
d1
=
{ (in1, put, tobegin1),(tobegin1, ¬, tobegin1) } È
{ (tobegin1, head, q1,1) \mid q Î IM } È
{ (p1,1, (labs, ®), q1,1)\mid (p,s,q) Î dM } È
{ (p1,1, (lift, labs, ®), q1,2) \mid(p,(s,1,0),q) Î dM } È
{ (p1,2, (labs, ®), q1,2)\mid (p,s,q) Î dM } È
{ (p1,2, (put, labs, ®), q1,3) \mid(p,(s,0,1),q) Î dM } È
{ (p1,2, (put, labs, tail), backup1) \mid(p,(s,0,1),qfin) Î dM for some qfin Î FM } È
{ (p1,3, (labs, ®), q1,3)\mid (p,s,q) Î dM } È
{ (p1,3, (labs, tail), backup1)\mid (p,s,qfin) Î dM for some qfin Î FM } È
{ (backup1, ¬, backup1), (backup1, lift, final1) }
I1
=
{in1}
F1
=
{ final1 }
A2 is similar to A1, only here the automaton, after dropping its pebble on u, first walks to the right until the end of the string, then simulates M (inverted), starting in a state corresponding to a state in FM, until it finds the pebble at u. It picks up the pebble and treats the symbol s at this position as (s,0,1) in M. A2 continues with the second stage of the simulation. Then, nondeterministically, A2 drops the pebble at some position v and continues with the third stage of the simulation until the beginning of the string. If it reaches the beginning of the string in an initial state of M, A2 walks back to the pebble at v and the simulation is finished.

Formally, A2 = (Q2, DSWPA(S), d2, I2, F2) is constructed as follows:

Q2
=
{ in2, toend2, backup2, final2 } È
{ q2,i \mid q Î QM, 1 £ i £ 3 }
d2
=
{ (in2, put, toend2),(toend2, ®, toend2) } È
{ (toend2, tail, q2,1) \mid q Î FM } È
{ (q2,1, (labs, ¬), p2,1)\mid (p,s,q) Î dM } È
{ (q2,1, (lift, labs, ¬), p2,2) \mid(p,(s,1,0),q) Î dM } È
{ (q2,2, (labs, ¬), p2,2)\mid (p,s,q) Î dM } È
{ (q2,2, (put, labs, ¬), p2,3) \mid(p,(s,0,1),q) Î dM } È
{ (q2,2, (put, labs, head), backup2) \mid(p,(s,0,1),qin) Î dM for some qin Î IM } È
{ (q2,3, (labs, ¬), p2,3)\mid (p,s,q) Î dM } È
{ (q2,3, (labs, head), backup2)\mid (p,s,qin) Î dM for some qin Î IM } È
{ (backup2, ®, backup2), (backup2, lift, final2) }
I2
=
{in2}
F2
=
{ final2 }

A3 looks again like the previous two parts of A. A3 however assumes that u = v, which implies that the only useful transitions with labels in S×B2 are the ones of the form (p,(s,1,1),q) Î dM. So A3 first drops its pebble at position u ( = v), then (like A1) walks to the head of the string and simulates M until it gets to the starting position (the pebble) again. It does not lift the pebble, because it needs to know the position of u = v. A3 treats the symbol s at this position as (s,1,1) in M. Then it continues with the second stage of the simulation of M until the end of the string. If it arrives there in a final state of M, A3 backs up to the pebble which ends the simulation of M.

Formally, A3 = (Q3, DSWPA(S), d3, IA, FA) is constructed as follows:

Q3
=
{ in3, tobegin3, backup3, final3 } È
{ q3,i \mid q Î QM, 1 £ i £ 2 }
d3
=
{ (in3, put, tobegin3),(tobegin3, ¬, tobegin3) } È
{ (tobegin3, head, q3,1) \mid q Î IM } È
{ (p3,1, (labs, ®), q3,1)\mid (p,s,q) Î dM } È
{ (p3,1, (here, labs, ®), q3,2) \mid(p,(s,1,1),q) Î dM } È
{ (p3,1, (here, labs, tail), backup3) \mid(p,(s,1,1),qfin) Î dM for some qfin Î FM } È
{ (p3,2, (labs, ®), q3,2)\mid (p,s,q) Î dM } È
{ (p3,2, (labs, tail), backup3)\mid (p,s,qfin) Î dM for some qfin Î FM} È
{ (backup3, ¬, backup3), (backup3, lift, final3) }
I3
=
{in3}
F3
=
{ final3 }
\square

Using this lemma the following theorem becomes trivial to prove.

Theorem 7 For each formula f(x,y) Î MSOL2(S), there exists a string-walking pebble automaton A over S such that R(f) = R(A).

1.6  Another proof

There is another way to prove Theorem 7. This method makes use of Proposition 5 to simulate the binary MSO formula by a string-walking automaton with MSO tests. The MSO tests are then simulated by making use of a pebble. Let S be an alphabet, and let f(x,y) Î MSOL2(S) be an MSO formula with two free position variables. Then there exists a string-walking automaton with MSO tests A = (QA,DA,dA,IA,FA) with DA a finite subset of DSWA+M(S) such that R(f) = R(A) (Proposition 5). We will construct a string-walking pebble automaton A¢ = (Q¢,DSWPA(S),d¢,I¢,F¢) such that R(A¢) = R(A) = R(f). We need the following result, which is Lemma 4 for the case k = 1.

Lemma 8 Let S be an alphabet. For every MSO test y(x) Î MSOL1(S), there exists a finite automaton M over SÈ(S×B1) such that, for all w Î S* and u Î Vw,

(w,u) \models y(x) iff mark(w,u) Î L(M) .

We now construct A¢. In first approximation, A¢ equals A for transitions (p,d,q) with d Î { ¬, ® }. We define T = {(p,d,q) Î dA \mid d Î MSOL1(S)}, the collection of all transitions with MSO tests in A. For each transition t = (p,y(x),q) Î T, we use Lemma 8 to construct the finite automaton Mt that calculates the MSO test y(x). We will use this automaton to construct a ``subroutine'' in A¢ that simulates t. The idea is as follows, similar to the construction of A3 in the proof of Lemma 6. A¢ first drops its pebble. Then it walks to the head of the string. From there, it checks the MSO test by following Mt. Any transitions in dMt with labels not in S are handled by using the pebble. The only possible form of such a transition is (p,(s,1),q). In place of this transition, we add transitions to check whether the position in the string where the automaton is now is marked, i.e., the pebble is there. Then the symbol s is checked and the automaton moves on to the next position in the string in state q. If Mt reaches a final state at the end of the string, A¢ moves back to the position where it dropped the pebble and lifts it. At this moment the MSO test y(x) is successfully verified by the automaton.

The following states are used for A¢. First, all states from A, QA. Then we need extra states for each MSO test. For all transitions t Î T we use, next to the states of Mt, tobegint to move to the beginning of the string and backupt to back up to the pebble, the place where the simulation of Mt started. We assume that all the Mt for all t Î T have unique states that do not overlap with either QA or states of other Mt's.

Formally, A¢ = (Q¢,DSWPA(S),d¢,I¢,F¢) is constructed as follows:

Q¢
=
QA È{ tobegint, backupt \mid t Î T }È
È
t Î T 
QMt
I¢
=
IA
F¢
=
FA
d¢
=
{ (p,d,q) Î dA \midd Î {¬,®} } È È
{ dt \mid t Î T }
Here the transitions needed to simulate t = (p,y(x),q) are as follows:
dt
=
{ (p, put, tobegint),(tobegint, ¬, tobegint)} È
{ (tobegint, head, sin)\mid sin Î IMt } È
{ s,(Øtail, Øhere, labs,®),t) \mid (s,s,t) Î dMt } È
{ s,(Øtail,here,labs,®),t) \mid (s,(s,1),t) Î dMt } È
{ s,(tail,Øhere,labs),backupt) \mid (s,s,tfin) Î dMt for some tfin Î FMt } È
{ s,(tail, here,labs),backupt) \mid (s,(s,1),tfin) Î dMt for some tfin Î FMt } È
{ (backupt, (Øhere, ¬), backupt),(backupt, lift, q) }

1.7  Pebble Necessity

The following theorem states that we really need the pebble in Theorem 7. The proof of this theorem is a simpler form of the proof for tree-walking automata (Theorem 15 in [BE97]). It is included here for completeness.

Theorem 9 There exist an alphabet S and a formula f(x,y) Î MSOL2(S) such that there is no string-walking automaton A with R(A) = R(f).

Proof. We construct an MSO formula f(x,y) Î MSOL2(S) with S = {b,r}, denoting black and red. For a string w over S, the binary relation R(f) connects certain positions of w. If w has exactly one position with symbol r, say vred, then R(f) will connect any position to vred. Otherwise, i.e., if there is no red symbol or more than one, R(f) will connect any position to the next in left-to-right circular order. We use the following abbreviations to define f(x,y):

ors
=
$y ( labr(y) Ù"z(labr(z) ® z = y))
succ(x,y)
=
pre(x,y) Ú( tail(x) Ùhead(y) )
f(x,y)
=
( orsÙlabr(y) ) Ú( ØorsÙsucc(x,y) )
Here ors is a closed MSO formula which is true if there is exactly one red symbol in w, and succ(x,y) is an MSO formula which is true if y follows x in left-to-right circular order. Clearly, f(x,y) is a binary MSO formula with R(f) as indicated above.

Now we will show that there is no string-walking automaton A = (Q,DSWA(S),d,I,F) with R(A) = R(f). Suppose that such an A exists. We may assume that I = {qin}. The idea is that when A starts at any position u of a string with only b's, it first has to visit all positionsc of the string to check there is no r in the string. Because there is no way for A to remember its starting point u, A also cannot find the position after u in left-to-right circular order.

Let w be b#Q+1, a string of #Q+1 black symbols. Let w¢ be rb#Q. Thus, w¢ is w with the first symbol changed to red. We will use the following function:

succ(k) = ì
í
î
k+1
if k £ #Q
1
otherwise (k = #Q+1)
which gives the successor to position k in left-to-right circular order. Because w does not contain a symbol r, there exists for every k Î [1,#Q+1] a state fk Î F such that
(qin,k) \twoheadrightarrowA,w* (fk, succ(k)) .
On the other hand, because w¢ contains exactly one symbol r, for all k Î [1,#Q], there is no state f Î F such that
(qin,k) \twoheadrightarrowA,w¢* (f, succ(k)) .
This difference implies that for all k Î [1, #Q] the walk of A on w from k to succ(k) must visit position 1, because the only different symbol is the first, which is black for w and red for w¢. Because the walk of A on w from #Q+1 to 1 also visits position 1, there is for all k Î [1,#Q+1] a qk Î Q such that
(qin,k) \twoheadrightarrowA,w* (qk,1) \twoheadrightarrowA,w* (fk,succ(k)) .
There are #Q+1 possibilities for k and only #Q states, which means there are k,k¢ Î [1,#Q+1] with k ¹ k¢ and qk = qk¢. Then
(qin,k) \twoheadrightarrowA,w* (qk,1) = (qk¢,1) \twoheadrightarrowA,w* (fk¢,succ(k¢)) .
This implies that A walks from k both to succ(k) and to succ(k¢) with k ¹ k¢, a contradiction to the fact that R(A) = R(f). \square

1.8  From String-Walking Pebble Automata to
MSO Formulas

In this section we will prove that a string-walking pebble automaton can be simulated by a string-walking automaton with MSO tests. Because it is already known that these automata can be simulated by binary MSO formulas (Proposition 5), and because we have shown in previous sections that a binary MSO formula can be simulated by a string-walking pebble automaton, we obtain the equivalence of binary MSO formulas and string-walking pebble automata.

To prove that it is possible to simulate the use of a pebble with MSO tests, we will need the following lemma. In the lemma, A is a string-walking pebble automaton that is not allowed to move its pebble, but it is allowed to check the presence of the pebble at the current position. The lemma states that the round-trip of A from the position where the pebble is, back to that position can be simulated by a unary MSO formula.

Lemma 10 Let A be a string-walking pebble automaton, with no transitions with directives put or lift, over S. Then, for every pair of states p,q Î QA, there exists a formula ypq(x) Î MSOL1(S) such that, for all w Î S* and x Î Vw,

(p,x,x) \twoheadrightarrowA,w* (q,x,x) iff (w,x) \models ypq(x) .

Proof. Let A be a string-walking pebble automaton over S without put and lift. Let p,q Î QA. First note that the pebble does not move during A's walk. We can therefore consider the pebble position a constant and mark it using a special symbol (s,1) where s is the original symbol at the pebble position. We will now define a string-walking automaton A¢ (without pebble) over SÈ(S×B1) that simulates the walk of A from the pebble position and back. We define A¢ = (QA, DSWA(SÈ(S×B1)), d¢, {p}, {q}). Thus, the states of A¢ are the same as those of A, A¢'s initial state is p, and its final state is q. Furthermore, d¢ consists of the following transitions:

d¢
=
dA Ç(QA ×DSWA(S)×QA) È
{ (s,lab(s,1),t) \mid (s,here,t) Î dA, s Î S} È
{ (s,labs,t) \mid (s,Øhere,t) Î dA, s Î S} È
{ (s,lab(s,1),t) \mid(s,labs,t) Î dA, s Î S}.
The directive ``here'' is simulated by checking whether the symbol at the current position is in S×B1. Similarly, ``Øhere'' is simulated by checking whether the symbol at the current position is in S. Otherwise, the new symbols from S×B1 are treated just like the corresponding symbols from S. Now, for all s,t Î QA, w Î S* and x,y,z Î Vw, it is easy to see that
(s,x,z) \twoheadrightarrowA,w* (t,y,z) iff (s,x) \twoheadrightarrowA¢,w¢* (t,y)
where w¢ = mark(w,z). This means that A¢ on a string marked at position z walking from x to y simulates A on the unmarked string with the pebble at position z. Both automata start in state s and end in state t. Also, from the definition of Rw¢(A¢),
(p,x) \twoheadrightarrowA¢,w¢* (q,y) iff (x,y) Î Rw¢(A¢) .

According to Proposition 5 we can construct a formula f¢pq(x,y) Î MSOL2(SÈ(S×B1)) that simulates the walk of A¢ on a marked string, so that, for all w Î S*, x,y,z Î Vw and w¢ = mark(w,z),

(x,y) Î Rw¢(A¢) iff (w¢,x,y) \models f¢pq(x,y) .
Using Lemma 2 yields a formula fpq(x,y,z) Î MSOL3(S) such that, for all w Î S* and x,y,z Î Vw,
(mark(w,z),x,y) \models f¢pq(x,y) iff (w,x,y,z) \models fpq(x,y,z) .
We now define ypq(x) = fpq(x,x,x) Î MSOL1(S). Then, for all w Î S* and u Î Vw,
(p,x,x) \twoheadrightarrowA,w* (q,x,x) iff (w,x) \models ypq(x) .
\square

Theorem 11 For every string-walking pebble automaton A there exists a string-walking automaton with MSO tests A¢ such that R(A) = R(A¢).

Proof. When A drops its pebble at some position u of string w, it must return there some later time to pick it up, because of the definition of Rw(A). In the mean time, i.e., until A returns to u, A does not move its pebble. By using Lemma 10, the actions of this automaton until its arrival in u can be simulated by a single MSO test. The details are as follows.

Let A = (Q,DSWPA(S),d,I,F) be a string-walking pebble automaton over S. We will define a string-walking automaton with MSO tests A¢. The MSO tests ypq are obtained by applying Lemma 10 to the automaton derived from A by removing all transitions of the form (p,d,q) with d Î {put,lift}. We define A¢ = (Q,D,d¢,I,F) with

D
=
{ ¬, ®, true(x), head(x), tail(x), Øhead(x), Øtail(x) } È
{ labs(x) \mid s Î S} È{ ypq(x) \mid p,q Î Q }
d¢
=
{ (p,d,q) Î d\mid d Î { ¬, ® } } È
{ (p,head(x),q) \mid (p,head,q) Î d} È
{ (p,Øhead(x),q) \mid (p,Øhead,q) Î d} È
{ (p,tail(x),q) \mid (p,tail,q) Î d} È
{ (p,Øtail(x),q) \mid (p,Øtail,q) Î d} È
{ (p,labs(x),q) \mid (p,labs,q) Î d} È
{ (p,true(x),q) \mid (p,Øhere,q) Î d} È
{ (p,yqr(x),s) \mid (p,put,q) Î d and (r,lift,s) Î d}
The directives ¬, ®, head, Øhead, tail, Øtail and labs (for s Î S) are simulated straightforwardly. The MSO test yqr(x) simulates the actions of A between the put and the lift. \square

This theorem leads us to the grand finale of this chapter.

Theorem 12 The following two statements hold:

There is another way to prove that any string-walking pebble automaton can be simulated by an MSO formula with two position variables. Let A be a string-walking pebble automaton over S that recognizes the relation R(A). Then a string-walking pebble automaton A¢ over SÈ(S×B2) can easily be constructed such that (w,u,v) Î R(A) just if mark(w,u,v) Î L(A¢). A¢ walks right until it encounters a symbol (s,1,x) with x Î {0,1}. Then A¢ starts simulating A directly until it encounters a symbol (s,y,1) with y Î {0,1} in a final state. Because SWPA=REG [BH65] and REG=MSO (Proposition 1), there exists an MSO formula y Î MSOL0(SÈ(S×B2)) such that L(A¢) = L(y). Using Corollary 3, we obtain an MSO formula f(x,y) Î MSOL2(S) such that, for all w Î S*, mark(w,u,v) Î L(y) just if (w,u,v) \models f(x,y).

Note that we can also use Theorem 12 to derive that SWPAREG, as follows:

Theorem 13 SWPAREG

Proof. First we will show that any language defined by a string-walking pebble automaton is MSO definable and hence regular. Let A be a string-walking pebble automaton over S. By definition, the language recognized by A is

L(A) = { w Î S* \mid there exists v Î Vw such that (1,v) Î Rw(A) }.
Using the second part of Theorem 12 we obtain a formula f(x,y) Î MSOL2(S) such that, for all w Î S* and u,v Î Vw,
(u,v) Î Rw(A) iff (w,u,v) \models f(x,y) .
We now define
y = "x ( $y (head(x) ® f(x,y) ) )
to obtain that, for all w Î S* and v Î Vw,
$v Î Vw ((1,v) Î Rw(A)) iff $v Î Vw ((w,1,v) \models f(x,y)) iff w \models y
so
L(A) = L(y)
and, since MSO=REG, L(A) is a regular language.

The other way around, we show that any regular language can be described by a string-walking pebble automaton. Let L be a regular language over an alphabet S. Let M = (Q,S,d,I,F) be a finite automaton such that L = L(M). We define a string-walking pebble automaton A = (Q¢,DSWPA(S),d¢,I,F¢) with

Q¢
=
Q È{qfin}
d¢
=
{ (p,(labs,®),q) \mid (p,s,q) Î d} È
{ (p,(labs,tail),qfin) \mid (p,s,q) Î d for some q Î F }
F¢
=
{ qfin}
It is obvious to see that L(A) = L(M) = L. \square

Chapter 2
Extensions to
String-Walking Pebble Automata

In this chapter we will introduce two extensions to the concept of string-walking pebble automata. These extensions will be constructed in such a way that the class of languages recognized by the automata is the same as the class of languages that the (ordinary) string-walking automata recognize, i.e., the class of regular languages.

2.1  String-Walking Multi-Pebble Automata

The first extension to the string-walking automaton is the string-walking multi-pebble automaton. This automaton can use more than one pebble. The number of pebbles available to the automaton is fixed. The pebbles will be taken from a finite set and must be used ``nested''. This means that, at any point in the automaton's walk, only the pebble that was dropped last can be lifted. It can easily be seen that we can, without changing the automaton's power, number the pebbles {1,2,¼,n} and demand that the pebbles must be used in order. In order to see this, suppose the pebbles are taken from a set P with |P| = n. Consider the set M of bijections from P to {1,2,¼,n} (there are n! such mappings). We can now simulate an automaton with pebble set P by an automaton with pebble set [1,n] by coding the pebble mapping into the states: for every state q we use additional states {qm \mid m Î M}. We also modify the transitions with put and lift so that the information on the mapping is maintained. Note that the constraint of the nested use of pebbles is necessary to make sure that the automaton does not recognize non-regular languages. A string-walking 2-pebble automaton without this limitation could recognize the language L = {ancbn \mid n Î \mathbb N} by putting the pebbles at both ends of the string and moving them towards the middle of the string in turn, checking that both pebbles arrive at the middle c after the same number of moves. This kind of use of the pebbles is prohibited by demanding that the pebbles be used nested. In broader terms, it can be shown that unnested multi-pebble automata are equivalent to two-way multi-head automata, which in turn are equivalent to logarithmic space Turing machines.

Let n ³ 1. We use the following set of directives for a string-walking n-pebble automaton over S:

DSWnPA(S)
=
DSWA(S)È{ puti, lifti, herei, Øherei \mid 1 £ i £ n }
For each w Î S* the relations for the new directives are as follows. The relations are on tuples (u,k,p) Î Vw ×[0,n]×([1,n] ® Pw). Here k represents that pebbles 1,¼,k are currently on the string and p maps the pebbles to string positions. We define p0 : [1,n] ® Pw with p0(i) = ^ for all i Î [1,n] as the mapping with no pebbles on the string.

The relations hold for any u Î Vw, k Î [0,n], and pebble placement function p:[1,n] ® Pw.

Rw(puti)
=
{ ((u, i-1, p), (u, i, p(i ® u))) \mid 1 £ i £ n }
Rw(lifti)
=
{ ((u,i,p), (u,i-1,p(i ® ^))) \mid 1 £ i £ n, p(i) = u }
Rw(herei)
=
{ ((u,k,p), (u,k,p)) \mid 1 £ i £ k and p(i) = u }
Rw(Øherei)
=
{ ((u,k,p), (u,k,p)) \mid 1 £ i £ n and (i > k or p(i) ¹ u) }
A string-walking n-pebble automaton over S is a finite automaton A over DSWnPA(S). For a string-walking n-pebble automaton A = (Q,DSWnPA(S),d,I,F) and a string w Î S*, a configuration is a tuple (q,u,k,p) Î Q ×Vw ×[0,n] ×([1,n] ® Pw). The set of all configurations of A and w is names \mathbb CA,w. The step relation \twoheadrightarrowA,w is defined in the obvious way. For all configurations (q,u,k,p), (q¢,u¢,k¢,p¢) Î \mathbb CA,w,
(q,u,k,p) \twoheadrightarrowA,w (q¢,u¢,k¢,p¢) iff $d Î DSWnPA(S):(q,d,q¢) Î d, ((u,k,p),(u¢,k¢,p¢)) Î Rw(d).
For each string w Î S*, A computes the relation
Rw(A) = { (u,v) Î Vw ×Vw \mid (q,u,0,p0) \twoheadrightarrowA,w* (q¢,v,0,p0) for some q Î I and q¢ Î F } .
The position relation R(A) and the language L(A) that A computes are defined in the usual way, as well as the behaviour of transitions of the form (q,s,q¢) with d Î DSWnPA(S)*.

2.2  From String-Walking Multi-Pebble Automata to
MSO Formulas

In this section we show that any position relation that can be computed by a string-walking multi-pebble automaton can also be defined by a binary MSO formula. Note that the other way around is obvious, because string-walking multi-pebble automata are an extension to string-walking pebble automata.

Theorem 14 For every string-walking n-pebble automaton A over an alphabet S there exists a binary MSO formula f(x,y) Î MSOL2(S) such that R(f) = R(A).

Proof. We will prove this theorem using natural induction on the number of pebbles n. Note that the case n = 1 is equal to the second part of Theorem 12. We present a proof similar to that in Section 1.8. Choose a number of pebbles n > 1. In the induction step, it can be assumed that Theorem 14 is valid for any string-walking (n-1)-pebble automaton (the induction hypothesis).

The following Claim (similar to Lemma 10) states that after a string-walking n-pebble automaton A drops its first pebble, its round-trips from the pebble position back to that same position can be simulated by a unary MSO formula. In the proof of this lemma we use the induction hypothesis.

[
Claim] Let A = (Q,DSWnPA(S),d,I,F) be a string-walking n-pebble automaton, with no transitions with directives put1 or lift1. Then, for every pair of states p,q Î Q, there exists a formula ypq(x) Î MSOL1(S) such that, for all w Î S* and x Î Vw,
(p,x,1,p0(1® x)) \twoheadrightarrowA,w* (q,x,1,p0(1 ® x)) iff (w,x) \models ypq(x) .
To prove this Claim, we construct a string-walking (n-1)-pebble automaton A¢ over SÈ(S×B1) that simulates the walk of A from the position of the first pebble and back (similar to the proof of Lemma 10). We define A¢ = (Q,DSW(n-1)PA(SÈ(S×B1)), d¢, {p},{q}) where
d¢
=
dÇ(Q ×DSWA(S)×Q) È
{ (s,lab(s,1), t) \mid (s,here1,t) Î d, s Î S} È
{ (s,labs,t) \mid (s,Øhere1,t) Î d, s Î S} È
{ (s,lab(s,1), t) \mid (s,labs,t) Î d, s Î S} È
{ (s,puti-1,t) \mid (s,puti,t) Î d, 2 £ i £ n } È
{ (s,lifti-1,t) \mid (s,lifti,t) Î d, 2 £ i £ n } È
{ (s,herei-1,t) \mid (s,herei,t) Î d, 2 £ i £ n } È
{ (s,Øherei-1,t) \mid (s,Øherei,t) Î d, 2 £ i £ n }
The states of A¢ are the same as those of A, A¢'s only initial state is p and A¢'s only final state is q. All directives that do not use or test the pebble are simulated directly. Pebble 1 is simulated by the marking, while pebbles 2,¼,n are simulated by pebbles 1,¼,n-1. Now it is easy to see that, for all s,t Î Q, w Î S*, x,y,z Î Vw and w¢ = mark(w,z),
(s,x,1,p0(1 ® z)) \twoheadrightarrowA,w* (t,y,1,p0(1 ® z)) iff (s,x,0,p0) \twoheadrightarrowA¢,w¢* (t,y,0,p0).
This means that A¢ on a string marked at position z, starting in a configuration without pebbles and walking from position x to y simulates A on the unmarked string, in a configuration with pebble 1 at position z. Note that, following the definition of Rw¢(A¢) for a string-walking (n-1)-pebble automaton,
(p,x,0,p0) \twoheadrightarrowA¢,w¢* (q,y,0,p0) iff (x,y) Î Rw¢(A¢).
According to the induction hypothesis, there exists a binary MSO formula
f¢pq(x,y) Î MSOL2(SÈ(S×B1))
that simulates the walk of A¢ on a marked string, such that, for all w Î S*, x,y,z Î Vw and w¢ = mark(w,z),
(x,y) Î Rw¢(A¢) iff (w¢,x,y) \models f¢pq(x,y).
The remainder of the proof of the Claim is again similar to the proof of Lemma 10. Again using Lemma 2 to pull the marking into the free position variables of the MSO formula, we obtain a formula fpq(x,y,z) Î MSOL3(S) such that, for all w Î S*, x,y,z Î Vw and w¢ = mark(w,z),
(w¢,x,y) \models f¢pq(x,y) iff (w,x,y,z) \models fpq(x,y,z) .
We now define the abbreviation ypq(x) = fpq(x,x,x) Î MSOL1(S). Then, for all w Î S* and x Î Vw,
(p,x,1,p0(1 ® x)) \twoheadrightarrowA,w* (q,x,1,p0(1 ® x)) iff (w,x) \models ypq(x) .
This ends the proof of the Claim. We now continue with the proof of Theorem 14. Let A = (Q,DSWnPA(S),d,I,F) be a string-walking n-pebble automaton over S. We will, like in the proof of Theorem 11, define a string-walking automaton with MSO tests A¢ over S such that R(A) = R(A¢). The MSO tests ypq(x) in this automaton are obtained by applying the Claim to the automaton derived from A by removing all transitions of the form (p,d,q) with d Î { put1, lift1 }. We define A¢ = (Q,D,d¢,I,F) exactly the same as in the proof of Theorem 11, but with Øhere1, put1 and lift1 instead of Øhere, put and lift:
D
=
{ ¬, ®, true(x), head(x), tail(x),Øhead(x), Øtail(x) } È
{ labs(x) \mid s Î S} È{ ypq(x) \mid p,q Î Q }
d¢
=
{ (p,d,q) Î d\mid d Î { ¬, ® } } È
{ (p,head(x),q) \mid (p,head,q) Î d} È
{ (p,Øhead(x),q) \mid (p,Øhead,q) Î d} È
{ (p,tail(x),q) \mid (p,tail,q) Î d} È
{ (p,Øtail(x),q) \mid (p,Øtail,q) Î d} È
{ (p,labs(x),q) \mid (p,labs,q) Î d} È
{ (p,true(x),q) \mid (p,Øhere1,q) Î d} È
{ (p,yqr(x),s) \mid (p,put1,q) Î d and (r,lift1,s) Î d}
Again, the directives {¬,®,head,Øhead,tail,Øtail} and labs (for s Î S) are simulated straightforwardly. The directive yqr(x) simulates the actions of A from put1 to the following lift1 (at the same location), including all puti and lifti for 2 £ i £ n. It is easy to see that R(A) = R(A¢). Now, using Proposition 5, there exists a binary MSO formula f(x,y) Î MSOL2(S) such that R(f) = R(A¢) = R(A). \square

2.3  String-Walking Marble Automata

Another way to extend string-walking pebble automata is the introduction of marbles instead of pebbles. They come in several colours , that are labeled with a colour alphabet G. For each colour g Î G, the number of marbles of colour g is not limited. In this paper, we give only the definition of string-walking marble automata and of their behaviour. The idea is that every marble colour can be used for a quantifier of MSO formulas; however, it is not proven here that string-walking marble automata compute exactly the position relations that binary MSO formulas recognize. This may not even be the case.

DSWMA(S,G) = DSWA(S)È{ putg, liftg, hereg, Øhereg \mid g Î G}

For all directives d, Rw(d) is a binary relation over Vw ×(G® 2Vw) ×\mathbb N ×(\mathbb N ® ((Vw ×G) È{^})). An element (u,p,n,s) of this set means that

Note that p can be determined from n and s: for every g Î G,

p(g) = { u Î Vw \mid $i Î [1,n]: s(i) = (u,g) } .
The relations for the directives from DSWA(S) are extended straightforwardly. These directives do not alter p, n or s. For the directives putg, liftg, hereg and Øhereg, the relations are as follows. The definition of Rw(putg) is complicated, because we demand that the marbles are nested as usual and that all marbles of colour g must be put down after each other, without putting down a marble of another colour in between.

Rw(putg)
=
{ ( (u,p,n,s), ( u,p(g® p(g) È{u}), n+1, s(n+1 ® (u,g))) ) \mid u \not Î p(g),
"1 £ i < j £ n ( Ø$v,v¢ Î Vw, g¢ Î G: (g¢ ¹ gÙs(i) = (v,g) Ùs(j) = (v¢,g¢) ) ) }
Rw(liftg)
=
{ ( (u,p,n,s),(u,p(g® p(g) \{u}), n-1, s(n® ^))) \mid s(n) = (u,g) }
Rw(hereg)
=
{ ( (u,p,n,s), (u,p,n,s) ) \midu Î p(g) }
Rw(Øhereg)
=
{ ( (u,p,n,s), (u,p,n,s) ) \midu \not Î p(g) }

We define p0:G® 2Vw with p0(g) = Æ for all g Î G. We define s0:\mathbb N ® ((Vw ×G) È{^}) with s(i) = ^ for all i Î \mathbb N.

The relation computed by A on a string w Î S* is

Rw(A) = { (u,v) Î Vw ×Vw \mid(q,u,p0,0,s0) \twoheadrightarrowA,w* (q¢,v, p0,0,s0) for some q Î I and q¢ Î F }.

Chapter 3
Binary MSO Formulas and Tree-Walking Automata

3.1  Preliminaries

In this section we recall the well-known concepts of trees, finite tree automata, regular tree languages, and monadic second order logic.

Trees

In the usual way, trees are defined as finite, directed graphs with labeled nodes and edges. Let S and G be alphabets of node labels and edge labels, respectively. A graph over (S,G) is a triple (V,E,lab), with V a finite set of nodes, E Í V ×G×V the set of labeled edges, and lab: V ® S the node-labeling function. For a given graph g, its nodes, edges and node-labeling functions are denoted Vg, Eg and labg, respectively.

The trees we consider have their nodes labeled by symbols from a ranked alphabet. The edge from a parent to its i-th child is labeled with the number i. A ranked alphabet S is an alphabet S together with a rank function rk:S® \mathbb N. For any symbol s Î S, rk(s) = k denotes that the rank of s is k. In terms of trees this means that a node labeled with s will have k children. For all k Î \mathbb N, Sk = { s Î S\mid rk(s) = k } is the set of symbols of rank k. The rank interval of the ranked alphabet S is [1,m], where m is the maximal rank of the elements of S; it is denoted rki(S).

A tree over S is an acyclic connected graph g over (S, rki(S)) such that

The set of all trees over S is denoted TS. A subset of TS is also called a tree language.

The root of a tree t (the node with no incoming edges) is denoted roott. For nodes u,v of t, if (u,i,v) Î Et, then u is called the parent of v, and v is called the i-th child of u, denoted by u ·i. We also define u·0 = u. We define the set of ancestors anc(u) of a node u as u itself and the ancestors of its parent. The least common ancestor of two nodes u and v (denoted lca(u,v)) is defined as the node w such that

If u is an ancestor of v, then v is a descendant of u. The set des(u) denotes the set of all descendants of u. For a node u of t, tu denotes the subtree of t with root u, i.e., the subgraph of t induced by the set of all descendants of u.

Let S be a ranked alphabet and let k Î \mathbb N. We define k-ary node relations in analogy to k-ary position relations for strings. A k-ary node relation over S is a subset of {(t, u1, ¼, uk) \mid t Î TS and ui Î Vt for all i Î [1,k]}.

We define marked trees as follows, in analogy to marked strings. Let S be a ranked alphabet and let k ³ 1. The ranked alphabet SÈ(S×Bk) is defined as usual, where we assign to the symbol (s, b1, ¼, bk) the rank rk(s) (for s Î S and bi Î [1,k] for i Î [1,k]). Let t Î TS and let u1, ¼, uk Î Vt. We define mark(t,u1, ¼, uk) as a tree over SÈ(S×Bk), with node-set Vt, edges Et and node-labeling function lab¢: Vt ® SÈ(S×Bk) with lab¢(u) = labt(u) if u ¹ ui for all i Î [1,k], and lab¢(u) = (labt(u), (u = u1), ¼, (u = uk)) otherwise (where (u = ui) = 1 iff u equals ui).

Finite Tree Automata

A finite tree automaton [GS84] traverses a tree bottom-up. It starts in all the leaves of the tree, in parallel, and works its way up to the root. A finite tree automaton has a finite set of states Q and a transition function. The state of the automaton in a node of the tree is determined by the states of the children of the node and by the label of the node. For a symbol s with rk(s) = k, the transition function ds is therefore a function from Qk to Q. A finite tree automaton accepts a tree if it ends at the root of the tree in one of a set of pre-determined final states.

Let S be a ranked alphabet. A (deterministic) finite tree automaton over S is a 4-tuple M = (Q,S,d,F) where Q is a finite set of states, S is the input alphabet, d = { ds \mid s Î S} with, for s Î Sk, ds: Qk ® Q the transition function for s, and F Í Q the set of final states. For every tree t Î TS and node u Î Vt, the state in which M reaches u, denoted stateM,t(u), is defined by bottom-up induction as stateM,t(u) = ds(stateM,t(u ·1), ¼, stateM,t(u ·k)) where labt(u) = s with rk(s) = k. The tree language recognized by M is L(M) = { t Î TS \mid stateM,t(root(t)) Î F }. A tree language L is called a regular tree language if it is recognized by a finite tree automaton. The class of all regular tree languages is named REGT.

For any finite tree automaton M = (Q,S,d,F), tree t Î TS and node u Î Vt, the set of successful states of M at u, denoted by succM,t(u), is defined (by top-down induction) as follows: succM,t(roott) = F and, for a node u Î Vt with labt(u) = s, rk(s) = k > 0, succM,t(u·i) is the set of all states q Î Q such that ds(q1, ¼, qi-1,q,qi+1,¼, qk) Î succM,t(u), where qj = stateM,t(u ·j) for j Î [1,i-1] È[i+1,k]. Intuitively, q Î succM,t(u) means that M will reach the root of t in a final state assuming that it reaches u in state q (instead of stateM,t(u)). Thus, t Î L(M) iff stateM,t(u) Î succM,t(u). This can easily be proven ([BE97], Lemma 1).

Monadic Second Order Logic (on trees)

The monadic second order logic on strings described in Section 1.1 can be extended to trees (see, e.g., [TW68,Don70,BE97]). We will in short describe the differences. Let S be a ranked alphabet.

The class of all MSO definable tree languages is named MSOT.

The following classical result from [Don70,TW68] states that MSOT=REGT, i.e., that formulas from monadic second order logic with no free variables define exactly the regular tree languages:

Proposition 15 A tree language is MSO definable if and only if it is regular.

As stated in Section 1.1, the results of Lemma 2, Corollary 3, and Lemma 4 are also valid for trees (with S a ranked alphabet and t Î TS instead of w Î S*).

3.2  Tree-Walking Automata

A tree-walking automaton (see, e.g., [KS81]) is very similar to a string-walking automaton (Section 1.2). A tree-walking automaton can walk freely on the tree. In this paper we will give a definition of tree-walking automaton that is slightly different from the definition in [KS81]. First of all, we will make use of directives to define the behaviour of the automaton, while [KS81] directly defines a binary step relation on the set of configurations. Secondly, with our definition the automaton is able to check which child of its parent the current node is. In [KS81], the automaton can only move up, without any restriction to child number. This makes our automaton more powerful (i.e., it can recognize more tree languages).

Let S be a ranked alphabet. Like a string-walking automaton, a tree-walking automaton over S is a finite automaton with a special set of directives as input alphabet. For a tree-walking automaton, we define the set of directives as

DTWA(S) = { ­i, ¯i \mid i Î rki(S) }È{ root, Øroot} È{ labs\mid s Î S} .
The meaning of the directives is as follows:

Note that the directive root corresponds to the directive head of the string-walking automaton. A directive ``leaf'' (corresponding to tail) is not necessary since the automaton can test whether the label of the current node has rank 0. Also, Øroot can be simulated by ­i ¯i for all i Î rki(S). To formalize the behaviour of these directives we define for each tree t Î TS and each directive d Î DTWA(S) the following binary relation Rt(d) on Vt:

Rt(­i)
=
{ (u,v) \mid (v,i,u) Î Et }
Rt(¯i)
=
{ (u,v) \mid (u,i,v) Î Et }
Rt(root)
=
{ (u,u) \mid u = roott }
Rt(Øroot)
=
{ (u,u) \mid u ¹ roott }
Rt(labs)
=
{ (u,u) \mid labt(u) = s}
Like the definition of string-walking automata, a tree-walking automaton over S is a finite automaton A over DTWA(S). For a tree-walking automaton A = (Q,DTWA(S),d,I,F) and a tree t Î TS, an element (q,u) of Q ×Vt is a configuration of the automaton. The set of all configurations of A and t is denoted by \mathbb CA,t. A configuration (q,u) denotes that the automaton A is in state q at node u.

Let A = (Q,DTWA(S),d,I,F) be a tree-walking automaton over S and let t Î TS. Like with string-walking automata, we define a step-relation \twoheadrightarrowA,t on the set of configurations as follows. For every (q,u), (q¢,u¢) Î \mathbb CA,t,

(q,u) \twoheadrightarrowA,t (q¢,u¢) iff $d Î DTWA(S): (q,d,q¢) Î d and (u,u¢) Î Rt(d) .
The automaton A computes on each tree t Î TS the binary relation
Rt(A) = { (u,u¢) Î Vt ×Vt \mid(qin,u) \twoheadrightarrowA,t* (qfin,u¢) for some qin Î I and qfin Î F } .
The node relation that A computes is
R(A) = { (t,u,v) \mid t Î TS and (u,v) Î Rt(A) } .
The definitions of these relations (\twoheadrightarrowA,t, Rt(A) and R(A)) are mutatis mutandis equal to the definitions for string-walking automaton in Section 1.2.

The tree language that A recognizes is defined as

L(A) = { t Î TS \mid(roott, v) Î Rt(A) for some v Î Vt } .
Note that we can assume that the walk of the automaton ends as well as begins in roott, since we can easily add extra transitions to make the automaton walk to the root at the end of its walk. The class of all tree languages recognized by any tree-walking automaton is named TWA. It is an open problem if TWA=REGT.

Like in the case of string-walking automata, we will also allow transitions of the form (p,s,q) with s Î DTWA(S)*. If s = d1 ¼dn, we define, for all t Î TS,

Rt(s) = Rt(d1) °¼°Rt(dn).
As with string-walking automata, these extended directives can easily be implemented by adding extra transitions and states.

Determinism of tree-walking automata in defined in analogy with determinism of string-walking automata. A tree-walking automaton A = (Q,DTWA(S),d,I,F) is deterministic if the following conditions hold:

  1. #I = 1.
  2. If (p,d,q) Î d, then p \not Î F.
  3. For all distinct transitions (p,d1,q1), (p,d2,q2) Î d, d1 and d2 are two mutually exclusive directives in DTWA(S).

Two directives d1,d2 Î DTWA(S) are again mutually exclusive if, for all t Î TS, dom(Rt(d1)) Çdom(Rt(d2)) = Æ. The following pairs of directives in DTWA(S) are mutually exclusive:

3.3  Push-Down Tree-Walking Automata

A push-down tree-walking automaton walks on a tree, starting in the root of the tree. It has a push-down store. At each step of the automaton's walk, its possible moves are determined by the automaton's state, by the symbol at the current position of the automaton and by the push-down symbol at the top of the store. If the automaton moves up in the tree, the top symbol is removed from the push-down store; if the automaton moves down, a new symbol is added to the push-down store. There are two main differences between the push-down tree-walking automata as described in this section and those in literature (e.g., [KS81]). Here, directives are used to describe the different possible actions of the automaton. Furthermore, [KS81] lets the automaton accept a tree if it walks up from the root, leaving the tree in a final state; in the description here, the walk can end in any node of the tree in a final state. These differences have no consequences for the power of the automaton. Note that the tree-walking automaton in [KS81] is the push-down tree-walking automaton with only one push-down symbol.

We will present the push-down tree-walking automaton as an extension of the tree-walking automaton. Let S be a ranked alphabet and let G be an alphabet. The directives of a push-down tree-walking automaton are

DPDTWA(S,G)
=
{ ­ } È{ ¯i,g \mid i Î rki(S), g Î G} È
{ labs\mid s Î S} È{ stayg1, g2 \mid g1, g2 Î G} .
The meaning of these directives is as follows:

The push-down tree-walking automaton does not need the directives root and Øroot. By using marked push-down symbols on the bottom of the push-down store, the automaton can check whether the current node is the root of the tree by inspecting the top symbol of the push-down store. If convenient, however, we will use the directives root and Øroot since a push-down tree-walking automaton with these directives can be simulated by a push-down tree-walking automaton without root and Øroot.

To formalize the directives, we define for each tree t Î TS and each directive d Î DPDTWA(S,G) the following binary relation Rt(d) on pairs (u,b) where u Î Vt is the current node, and b = g1 ¼gn Î G* is the current contents of the push-down store with n equal to the number of ancestors of u.

Rt(­)
=
{ ( (u, g1 ¼gn),(v, g1 ¼gn-1) ) \mid v is the parent of u }
Rt(¯i,g)
=
{ ( (u, g1 ¼gn),(v, g1 ¼gn g)) \midv is the ith child of u }
Rt(labs)
=
{ ((u, g1 ¼gn),(u, g1 ¼gn)) \midlabt(u) = s}
Rt(stayg,g¢)
=
{ ((u, g1 ¼gn-1 gn),(u, g1 ¼gn-1 g¢)) \mid gn = g}

Let S be a ranked alphabet (of node labels) and let G be an alphabet (of push-down symbols) with a designated element gin (the initial push-down symbol). A push-down tree-walking automaton over (S,G) is a finite automaton A over DPDTWA(S,G). For a push-down tree-walking automaton A = (Q,DPDTWA(S,G),d,I,F) over (S,G) and a tree t Î TS, the configurations of A are triples (q,u,b), with q Î Q the state of the automaton, u Î Vt the current node, and b = g1 ¼gn Î G* the contents of the push-down store, where n is equal to the number of ancestors of u. The set of all configurations on A and t is denoted by \mathbb CA,t.

Let A = (Q,DPDTWA(S,G),d,I,F) be a push-down tree-walking automaton over (S,G) and let t Î TS. The definition of the step-relation \twoheadrightarrowA,t is almost identical to the definition of the step-relation for tree-walking automata. For every (q,u,b), (q¢,u¢,b¢) Î \mathbb CA,t,

(q,u,b) \twoheadrightarrowA,t (q¢,u¢,b¢) iff $d Î DPDTWA(S,G): (q,d,q¢) Î d and ((u,b),(u¢,b¢)) Î Rt(d) .
The tree language that A recognizes is defined as
L(A) = { t Î TS \mid(qin,roott,gin) \twoheadrightarrowA,t* (qfin,u,b) for some qin Î I, qfin Î F, u Î Vt and b Î G* } .
The automaton starts in the root of a tree with only the initial push-down symbol gin on the push-down store. It recognizes the tree if it reaches a final state in any node and with any contents of the push-down store. The class of all tree languages recognized by any push-down tree-walking automaton is named PDTWA. Note that because of the push-down store there is no reasonable definition for a binary node relation for push-down tree-walking automata.

Transitions of the form (p,s,q) with s Î DTWA(S)* are treated in the same way as with tree-walking automata. Determinism of push-down tree-walking automata is defined in the same way as for tree-walking automata. The following pairs of directives in DPDTWA(S,G) are mutually exclusive:

3.4  Equivalence of Finite Tree Automata and
Push-Down Tree-Walking Automata

In this section we show the known result [KS81] that push-down tree-walking automata recognize exactly the regular tree languages. First we show that for every finite tree automaton there exists a push-down tree-walking automaton that recognizes the same language; then we show that, the other way around, for every push-down tree-walking automaton there exists a finite tree automaton that recognizes the same language.

Lemma 16 For every finite tree automaton M there exists a push-down tree-walking automaton A such that L(A) = L(M).

Proof. Let M = (Q,S,d,F) be a finite tree automaton. We construct a push-down tree-walking automaton A = (Q¢,DPDTWA(S,G),d¢,{qin},{qfin}) over (S,G) such that L(A) = L(M). The automaton A walks on a tree t in a single pass from left to right. It enters each subtree of t with root u in state qin and leaves it in state stateM,t(u). It uses the pushdown store to remember, (a) how many children of u have already been processed and (b) what the resulting states were. Therefore we define Q¢ = Q È{qin,qfin} (with qin,qfin\not Î Q), G = È{ Qi \mid i Î rki(S) } È{l} and gin = l. The pushdown symbol l (the empty string) indicates that no subtrees have been traversed yet. If the automaton is at node u and the pushdown symbol at the top of the pushdown store is q1 ¼qk, this means that the first k subtrees of u have been traversed and that, for every i with 1 £ i £ k, stateM,t(u ·i) = qi.

The transition relation d¢ is as follows:

d¢
=
{ (qin, (Øroot, labs, ­), ds) \mids Î S, rk(s) = 0 } È
{ (qin, (root, labs), qfin) \mids Î S, rk(s) = 0, ds Î F} È
{ (qin, (labs, ¯1,l), qin) \mids Î S, rk(s) > 0 } È
{ (qk, (labs, stayq1 ¼qk-1, q1 ¼qk, ¯k+1,l), qin) \mid
s Î S, "j Î [1,k] (qj Î Q), k < rk(s) } È
{ (qk, (Øroot, labs, stayq1 ¼qk-1, q1 ¼qk-1, ­), ds(q1, ¼, qk)) \mid
s Î S, "j Î [1,k] (qj Î Q), k = rk(s) > 0 } È
{ (qk, (root, labs, stayq1 ¼qk-1, q1 ¼qk-1), qfin) \mid
s Î S, "j Î [1,k] (qj Î Q), k = rk(s) > 0, ds(q1, ¼, qk) Î F }
\square

Lemma 17 For every push-down tree-walking automaton A there exists a finite tree automaton M such that L(M) = L(A).

Proof. Let A = (Q,DPDTWA(S,G),d,I,F) be a push-down tree-walking automaton over (S,G). From this automaton, we first construct another push-down tree walking automaton A¢. A¢ is equal in behaviour to A, except that when A is in a final state, A¢ has extra transitions to move up to the root of the tree and then further up ``out'' of the tree in an extra state qfin. Note that A¢ can never execute the latter instruction; it is added only to make the construction of M easier. We define A¢ = (Q¢,DPDTWA(S,G),d¢,I,{qfin}) with Q¢ = Q È{qfin}, qfin\not Î Q and d¢ = dÈ{(q,­,qfin) \mid q Î F} È{ (qfin, ­, qfin) }. We now construct a finite tree automaton M = (QM,S,dM,FM) such that L(M) = L(A). We use the method of transition tables (see, e.g., [KS81]). In this method, the states of the finite tree automaton are relations R Í (G×Q¢) ×Q¢. When the finite tree automaton computes a relation R as state for a node u with ((g,q),q¢) Î R , this means that the push-down tree-walking automaton A¢ can traverse the subtree with u as root, starting at node u in state q with symbol g on the top of the push-down store, and emerging from the subtree (by an edge from u to its parent) in state q¢, with g removed from the push-down store. At the root, A¢ ``emerging'' from the tree in state qfin means that A arrived in a final state.

Formally, we define

QM
=
2(G×Q¢) ×Q¢
dM
=
{ ds \mid s Î S}, with ds (s Î S) defined below
FM
=
{ R Í (G×Q¢) ×Q¢\mid ((gin,qin),qfin) Î R for some qin Î I }
Here, for every s Î S, the transition function for symbol s is ds: QMrk(s) ® QM with
ds(R1, ¼, Rn)
=
{ ((g, q), q¢¢) \mid g Î G, q,q¢¢ Î Q¢ and (g,q)  R(s, R1, ¼, Rn)*  (g¢, q¢)
for some (g¢, q¢) Î G×Q¢ with (q¢, ­, q¢¢) Î d¢} .
In this definition, R(s, R1, ¼, Rn) is the binary relation on G×Q¢ that contains the results of the possible actions of the push-down tree-walking automaton A¢ while staying on the same node or entering a subtree and emerging again from it. The symbol s is the symbol at the current position and the relations Ri Í (G×Q¢) ×Q¢ are the transition tables of the children of the current node. The relation R is formally defined as
R(s, R1, ¼, Rn)
=
{ ((g,q),(g,q¢¢)) \mid(q, ¯i,g¢, q¢) Î d¢ and ((g¢,q¢),q¢¢) Î Ri
for some i Î [1,rk(s)] } È
{ ((g,q),(g¢,q¢)) \mid(q, stayg,g¢, q¢) Î d¢} È
{ ((g,q), (g,q¢)) \mid(q, labs, q¢) Î d¢} .
Note that these relations can effectively be computed. The computation of R(s, R1, ¼, Rn) is straightforward. Since it is a binary relation on a finite set, its reflexive transition closure can be computed by regarding the relation as a graph and computing all paths in the graph (the nodes of the graph are the elements of G×Q¢ and there is an edge from (g1,q1) to (g2,q2) iff ((g1,q1),(g2,q2)) is in the relation). \square

Combining these two lemmas yields the following theorem.

Theorem 18 PDTWA = REGT

3.5  Tree-Walking Marble Automata

Another way to extend tree-walking automata is with marbles . The concept of marbles is the same as that of Section 2.3 for strings, but they are used here in a different way. Marbles are coloured (with symbols from an alphabet) and there are infinitely many marbles of each colour available. Like pebbles, the automaton can drop a marble at a certain node, check for its presence any later time and lift it again. It is not allowed to put more than one marble of the same colour on the same node. Note that by taking the powerset of the marble alphabet we can even pose that not more than one marble (of any colour) is on any node. In the rest of this paper we will implicitly assume that this is the case. Unlike pebbles, putting down a marble on a certain node restrains the automaton to the subtree of which that node is the root, until the marble is lifted again. This restriction makes sure that the marbles are ``almost'' nested: unnested use of marbles is only possible if the marbles are at the same node. We will show that this restriction makes the tree-walking marble automaton very similar to the push-down tree-walking automaton, while it allows for the definition of the binary relation computed by a tree-walking marble automaton on a tree, in a natural way.

Let S be a ranked alphabet and let G be an alphabet of marble labels. Note that there is no gin necessary now. For a tree-walking marble automaton over (S,G), we define the set of directives as

DTWMA(S,G) = DTWA(S)È{ putg, liftg, hereg, Øhereg \midg Î G} .
The meaning of the extra directives is as follows:

For t Î TS, the position of the marbles is formalized by a function m:G® 2Vt. For this function, m(g) = S Í Vt means that exactly the nodes in S have a marble with label g. The set of all marble position functions is denoted by Mt,G = G® 2Vt. The behaviour of the directives, including the restrictions mentioned above, is formalized by defining for each tree t Î TS and each directive d Î DTWMA(S,G) the following binary relation Rt(d) on Vt ×Mt,G.

Rt(­i)
=
{ ((u,m), (u¢,m)) \midu is the ith child of u¢ andfor all g Î G, u \not Î m(g) }
Rt(¯i)
=
{ ((u,m),(u¢,m)) \midu¢ is the ith child of u }
Rt(root)
=
{ ((roott,m),(roott,m)) }
Rt(Øroot)
=
{ ((u,m),(u,m)) \mid u ¹ roott }
Rt(labs)
=
{ ((u,m),(u,m)) \midlabt(u) = s}
Rt(putg)
=
{ ((u,m),(u,m¢)) \midu \not Î m(g), m¢ = m(g® m(g) È{u}) }
Rt(liftg)
=
{ ((u,m),(u,m¢)) \midu Î m(g), m¢ = m(g® m(g) \{u}) }
Rt(hereg)
=
{ ((u,m),(u,m)) \mid u Î m(g) }
Rt(Øhereg)
=
{ ((u,m),(u,m)) \mid u \not Î m(g) }
The relation for d = ­i makes sure that the marbles are used nested, since it only allows moving up if there are no marbles on the current node.

Let S be a ranked alphabet and let G be an alphabet. A tree-walking marble automaton over (S,G) is a finite automaton A over DTWMA(S,G). For a tree-walking marble automaton A = (Q,DTWMA(S,G),d,I,F) over (S,G) and a tree t Î TS, the configurations of A are triples (q,u,m) Î Q ×Vt ×Mt,G, with q the state of the automaton, u the current node and m the marble position function. The set of all configurations of A and t is denoted by \mathbb CA,t.

Let A = (Q,DTWMA(S,G),d,I,F) be a tree-walking marble automaton over (S,G) and let t Î TS. As usual, we define the binary step-relation \twoheadrightarrowA,t on \mathbb CA,t. For every (q,u,m), (q¢,u¢,m¢) Î \mathbb CA,t,

(q,u,m) \twoheadrightarrowA,t (q¢,u¢,m¢) iff $d Î DTWMA(S,G):(q,d,q¢) Î d and (u,m) Rt(d) (u¢,m¢) .
The automaton A computes on each tree t Î TS the binary relation
Rt(A) = { (u,u¢) Î Vt ×Vt \mid(qin,u,m0) \twoheadrightarrowA,t* (qfin,u¢,m0) for some qin Î Iand qfin Î F } ,
where m0:G® 2Vt with m0(g) = Æ for all g Î G. Note that this definition demands that A removes all marbles from the tree. The node relation that A computes is, as usual,
R(A) = { (t,u,v) \mid t Î TS and (u,v) Î Rt(A) } .
The tree language that A recognizes is also defined in the familiar way:
L(A) = { t Î TS \mid (t,roott,v) Î R(A) for some v Î Vt} .
The class of all tree languages recognized by a tree-walking marble automaton is named TWMA. Transitions of the form (q,s,q¢) with s Î DTWMA(S,G)* are treated in the usual way. Determinism of tree-walking marble automata is defined in the same way as of (simple) tree-walking automata. The following pairs of directives in DTWMA(S,G) are mutually exclusive:

We define the abbreviation ØhereG for Øhereg1, ¼, Øheregk with G = {g1, ¼, gk}. These directives together test that there is no marble on the current node.

Note that root and Øroot are needed for the tree-walking marble automaton because, unlike the push-down tree-walking automaton, a tree-walking marble automaton does not have to start its walk at the root of the tree, so it is not always possible to place a special marble at the root. Likewise, directives ­i for i Î rki(S) are needed because, when the automaton starts in a certain node not equal to the root, the path from the root to that node cannot be coded into marbles.

Theorem 19 TWMA = REGT

Proof. Since PDTWA=REGT (Theorem 18), it is sufficient to show that TWMA=PDTWA. First we prove that any push-down tree walking automaton can be simulated by a tree-walking marble automaton. Let A = (Q,DPDTWA(S,G),d,I,F) be a push-down tree-walking automaton over (S,G). We define a tree-walking marble automaton

A¢ = (QÈ{qin,qfin},DTWMA(S,G),d¢,{qin},{qfin})
with a new initial state qin and a new final state qfin. This automaton simulates A in the following way. First, a marble with colour gin, representing the initial push-down symbol, is put on the current node. Then the directive ¯i,g is simulated by moving down to the ith child of the current node and then putting down a marble of colour g. On the way up, the directive ­ is simulated by picking up any marble (there is always exactly one on the current node) and moving up to the parent of the current node. The directive labs is also available in DTWMA(S,G), so it can be simulated directly. When A reaches a final state, A¢ moves up to the root, picking up all the marbles on the way up, so that it accepts the tree when it picks up the last marble at the root. A configuration of the push-down tree-walking automaton (u,g1 ¼gn) is simulated by a configuration (u,m) of the tree-walking marble automaton, where, for all i Î [1,n], m has a marble gi on the ith node of the path from the root to u, and no other marbles. A¢ has the following transitions:
d¢
=
{ (qin, putgin, q) \mid q Î I } È
{ (q,(liftg, ­i),q¢) \mid (q,­,q¢) Î d, g Î G, i Î rki(S) } È
{ (q,(¯i,putg),q¢) \mid (q,¯i,g,q¢) Î d} È
{ (q,labs,q¢) \mid(q,labs,q¢) Î d} È
{ (q, (liftg1,putg2),q¢) \mid(q,stayg1, g2,q¢) Î d} È
{ (q, liftg, qfin) \mid q Î F, g Î G} È
{ (qfin, (­i, liftg), qfin) \mid i Î rki(S), g Î G}.
Obviously, L(A¢) = L(A).

The other way around, let A = (Q,DTWMA(S,G),d,I,F) be a tree-walking marble automaton over (S,G). As observed before, we may assume that the automaton is constructed in such a way that, at any time, there is at most one marble on any node. Note that, due to the restriction of the automaton to the subtree of the current node whenever a marble is put down, at any moment all marbles lie on the path from the root to the current node. This path can be viewed as a push-down store, except that here ``empty'' places in the store are allowed. To account for these empty places, we add an extra marble colour g0. In order to simulate root, Øroot and the directives ­i (for i Î rki(S)), extra information is put on the push-down store. The new push-down symbols are of the form (g, i, r), where g Î G is the marble colour (or g = g0 if there is no marble on the current node), the current node is the ith child of its father, and r = 1 if the current node is the root and r = 0 if the current node is not the root. The initial pushdown symbol is gin = (g0,1,1). Now the push-down store can be used to simulate the marbles. We define a push-down tree-walking automaton A¢ = (Q¢,DPDTWA(S,G¢),d¢,I,F¢).

Formally, the push-down alphabet and the transitions of A¢ are defined as follows.

G¢
=
(GÈ{g0}) ×rki(S) ×{0,1} with gin = (g0,1,1)
Q¢
=
Q È{up, qfin}
F¢
=
{ qfin}
d¢
=
{(q,(stay(g0,i,r),(g0,i,r), ­),q¢) \mid r Î {0,1}, (q, ­i, q¢) Î d} È
{ (q, ¯i, (g0,i,0),q¢) \mid(q,¯i,q¢) Î d} È
{ (q, stay(g,i,1),(g,i,1), q¢) \midg Î GÈ{g0}, i Î rki(S),(q,root,q¢) Î d} È
{ (q, stay(g,i,0),(g,i,0), q¢) \midg Î GÈ{g0}, i Î rki(S),(q,Øroot,q¢) Î d} È
{ (q, labs, q¢) \mid(q,labs,q¢) Î d} È
{ (q, stay(g0,i,r),(g,i,r), q¢) \midi Î rki(S), r Î {0,1}, (q,putg,q¢) Î d} È
{ (q, stay(g,i,r),(g0,i,r), q¢) \midi Î rki(S), r Î {0,1}, (q,liftg,q¢) Î d} È
{ (q, stay(g,i,r),(g,i,r), q¢) \midi Î rki(S), r Î {0,1}, (q,hereg,q¢) Î d} È
{ (q, stay(g,i,r),(g,i,r), q¢) \midi Î rki(S), r Î {0,1}, g Î (GÈ{g0}),
(q,Øhereg¢,q¢) Î d for some g¢ ¹ g} È
{ (q, stay(g0,i,1), (g0,i,1), qfin) \midq Î F, i Î rki(S) } È
{ (q, (stay(g0,i,0), (g0,i,0), ­), up) \midq Î F, i Î rki(S) } È
{ (up, (stay(g0,i,0), (g0,i,0), ­), up) \midi Î rki(S) } È
{ (up, stay(g0,i,1), (g0,i,1), qfin) \midi Î rki(S) }
Since this automaton simulates A, we have L(A¢) = L(A). \square

One might wonder why it is necessary to restrict a tree-walking automaton, when it puts down a marble on a node, to the subtree of which that node is the root. We will show that just demanding that the marbles be used nested is not enough; this would allow the automaton to recognize non-regular tree languages. As an example, consider the non-regular tree language L = { tn \mid n Î \mathbb N} over the ranked alphabet S = {a,b,c} with rk(a) = rk(b) = 1 and rk(c) = 0, with, for all n Î \mathbb N,

Vtn
=
{xi,yi \mid i Î [1,n]} È{z}
Etn
=
{ (xi,1,xi+1), (yi,1,yi+1) \mid i Î [1,n-1] } È{(xn,1,y1), (yn,1,z) }
labtn
=
{(xi,a), (yi,b) \mid i Î [1,n] } È{(z,c)}
This tree language L is similar to the (non-regular) language { an bn c \mid n Î \mathbb N}. If we only demand the nested use of marbles, L is recognized by the tree-walking marble automaton A over (S,{g}), that puts the first marble on the first ``a'', walks to the bottom of the tree, checks whether there is a ``c'' there and a ``b'' directly above, puts a marble on that ``b'', walks up the tree to the first ``a'' again, puts a marble on the second ``a'' and continues walking up and down until it reaches the middle, where it can check if there are as much ``a'''s as ``b'''s. It then removes the marbles in the reverse order, walking up and down the tree in a similar way.

Note that tree-walking multi-pebble automata, defined in analogy to the string-walking multi-pebble automata of Section 2.1, recognize only regular tree languages (but it is open whether or not they recognize them all). Multi-pebble automata have only a finite number of pebbles available, contrary to the infinite number of marbles of marble automata. This means the above described automaton will not work with a multi-pebble automaton.

3.6  Tree-Walking Marble/Pebble Automata

There is one feature to be added to tree-walking marble automata to make sure that they have the same power as binary MSO formulas with respect to describing binary node relations. A tree-walking marble/pebble automaton has one pebble (next to marbles) which it can use in the usual way. The only restriction is that the use of marbles and pebble together must be nested, e.g., it is not allowed to put down a marble, then the pebble and then pick up the marble before the pebble. Unlike the marbles, the pebble does not restrict the automaton to the subtree induced by all descendants of the node where the pebble is. The pebble will be used to mark one special node, e.g., the starting node of a walk on the tree. Like with tree-walking marble automata, we will always assume that the automaton is constructed in such a way that there is not more than one marble on any node, at any time.

Let S be a ranked alphabet and let G be an alphabet of marble colours. We define the set of directives for a tree-walking marble/pebble automaton as

DTWMPA(S,G)
=
DTWMA(S,G)È{ put,lift,here,Øhere}
=
{ ­i, ¯i \mid i Î rki(S) } È{ root, Øroot} È{ labs\mid s Î S} È
{ putg, liftg, hereg, Øhereg \midg Î G} È{ put,lift,here,Øhere}
Here put, lift, here and Øhere (without subscripts) are the directives for the pebble, while putg, liftg, hereg and Øhereg are the directives for a marble of colour g. To enforce nesting, we now define the binary relations for the directives on triples (u,k,p) Î Vt ×\mathbb N ×(\mathbb N ® MPt) with MPt = (Vt ×G) ÈVt È{^}. Here an element m of MPt represents the action of putting down a marble of colour g on node u (m = (u,g)) or of putting down the pebble on node u (m = u), or no action (m = ^). Of a triple (u,k,p), u is the current node of the automaton, k is the number of marbles and pebbles currently on the tree and p is the marble/pebble placement function. Intuitively, p(1), ¼, p(k) are the actions executed in the past (in that order) of putting down marbles or the pebble. The function p is used as a kind of push-down stack, putting another item on the stack if a marble or pebble is put down and removing the topmost item from the stack if a marble or pebble is lifted. At all times, p(n) = ^ if n > k.

For each tree t Î TS and each directive d Î DTWMPA(S,G) we define the following binary relations:

Rt(­i)
=
{ ((u,k,p), (u¢,k,p)) \midu is the ith child of u¢ and
p(j) ¹ (u,g) for all j Î [1,k], g Î G}
Rt(¯i)
=
{ ((u,k,p),(u¢,k,p)) \midu¢ is the ith child of u }
Rt(root)
=
{ ((roott,k,p),(roott,k,p)) }
Rt(Øroot)
=
{ ((u,k,p),(u,k,p)) \mid u ¹ roott }
Rt(labs)
=
{ ((u,k,p),(u,k,p)) \midlabt(u) = s}
Rt(putg)
=
{ ((u,k,p), (u,k+1,p¢)) \midp¢ = p(k+1 ® (u,g)),
p(i) ¹ (u,g) for all i Î [1,k] }
Rt(liftg)
=
{ ((u,k,p), (u,k-1,p¢)) \midk ³ 1, p(k) = (u,g), p¢ = p(k® ^) }
Rt(hereg)
=
{ ((u,k,p), (u,k,p)) \midp(i) = (u,g) for some i Î [1,k] }
Rt(Øhereg)
=
{ ((u,k,p), (u,k,p)) \midp(i) ¹ (u,g) for all i Î [1,k] }
Rt(put)
=
{ ((u,k,p), (u,k+1,p¢)) \midk ³ 0, p¢ = p(k+1 ® u),
p(i) Î MPt \Vt for all i Î [1,k] }
Rt(lift)
=
{ ((u,k,p), (u,k-1,p¢)) \midk ³ 1, p(k) = u, p¢ = p(k ® ^) }
Rt(here)
=
{ ((u,k,p), (u,k,p)) \midp(i) = u for some i Î [1,k] }
Rt(Øhere)
=
{ ((u,k,p), (u,k,p)) \midp(i) ¹ u for all i Î [1,k] }
Let S be a ranked alphabet and let G be an alphabet. A tree-walking marble/pebble automaton over (S,G) is a finite automaton over DTWMPA(S,G). We continue with the usual definitions. For a tree-walking marble/pebble automaton A = (Q,DTWMPA(S,G),d,I,F) over (S,G) and a tree t Î TS, the configurations of A are 4-tuples (q,u,k,p) Î Q ×Vt ×\mathbb N ×(\mathbb N ® MPt), with q the state, u the current node, k the number of marbles and pebbles currently on the tree and p the marble/pebble placement function. The set of all configurations of A and t is denoted by \mathbb CA,t. We define the binary step-relation \twoheadrightarrowA,t on \mathbb CA,t in the usual way. For every (q,u,k,p),(q¢,u¢,k¢,p¢) Î \mathbb CA,t,
(q,u,k,p) \twoheadrightarrowA,t (q¢,u¢,k¢,p¢) iff $d Î DTWMPA(S,G): (q,d,q¢) Î d and (u,k,p)  Rt(d)  (u¢,k¢,p¢) .
The automaton A computes on each tree t Î TS the binary relation
Rt(A) = { (u,u¢) Î Vt ×Vt \mid (qin,u,0,p0)  \twoheadrightarrowA,t*  (qfin,u¢,0,p0) for some qin Î I, qfin Î F } ,
where p0 is the empty marble/pebble placement function, with p0(i) = ^ for all i Î \mathbb N. The node relation that A computes is, as usual,
R(A) = { (t,u,v) \mid t Î TS and (u,v) Î Rt(A) } .
The tree language that A recognizes is, also as usual,
L(A) = { t Î TS \mid (t,roott,v) Î R(A) for some v Î Vt } .
The class of all languages that are recognized by a tree-walking marble/pebble automaton is named TWMPA. Transitions of the form (q,s,q¢) with s Î DTWMPA(S,G)* are treated in the usual way. The definition of deterministic tree-walking marble/pebble automata is equal to the definition of deterministic string-walking automata. The following pairs of directives in DTWMPA(S,G) are mutually exclusive, next to those pairs already mutually exclusive for tree-walking marble automata:

Theorem 20 TWMPA = REGT

Proof. Because TWMA=REGT (Theorem 19) and because each tree-walking marble automaton is also a tree-walking marble/pebble automaton, it is sufficient to show that, for each tree-walking marble/pebble automaton A, there exists a tree-walking marble automaton A¢ such that L(A¢) = L(A).

Let A = (Q,DTWMPA(S,G),d,I,F) be a tree-walking marble/pebble automaton over (S,G). We may again assume that A is constructed in such a way that, at any time, there is at most one marble on any node. We also assume that A nevers puts the pebble on an empty tree, i.e., there is always at least one marble on the tree when A puts down the pebble. This can easily be accomplished by adding an extra marble colour and having A put one on the root before starting. This also allows us to assume that A does not use the directives root and Øroot.

We first formulate and prove the following Claim. This Claim states that for a situation where A just put down its pebble on the current node u and the previous marble (of colour g) was put on an ancestor v of u, there exists a finite tree automaton M such that A can make a round-trip, starting and finishing at u, starting in state q and finishing in state q¢, with the same marbles on the tree at the beginning and at the end of the round-trip (and without lifting the pebble at u), if and only if M accepts tv (the subtree of t with root v), marked at node u. Due to the nesting constraints, the round-trip of A takes place only on tv (A cannot move higher up the tree than v before lifting the marble at u).

[
Claim] Let A = (Q,DTWMPA(S,G),d,I,F) be a tree-walking marble/pebble automaton over (S,G) that does not make use of the directives put, lift, root, and Øroot. Then there exists for every g Î G and q,q¢ Î Q a finite tree automaton M over SÈ(S×B1) such that, for all t Î TS, u,v Î Vt, k ³ 2, p Î \mathbb N ® MPt with p(k) = u and p(k-1) = (v,g), with u Î des(v) and the only marble on tv is at v (i.e., "i < k: if p(i) = (v¢,g¢) then v¢\not Î des(v)),
(q,u,k,p) \twoheadrightarrowA,t* (q¢,u,k,p) iff mark(tv,u) Î L(M).
To prove this Claim, let A = (Q,DTWMPA(S,G),d,I,F) be a tree-walking marble/pebble automaton over (S,G) that does not make use of the directives put, lift, root, and Øroot, and let gr Î G (the g in the Claim) and q1,q2 Î Q. We first define a tree-walking marble automaton A¢ = (Q¢,DTWMA(SÈ(S×B1),G),d¢,I¢,F¢) over (SÈ(S×B1),G) such that the Claim holds for A¢ rather than M. The automaton A¢ uses the mark to know the position of the pebble. It walks on tv; there is no marble of colour gr on v. A¢ starts in the root of tv, i.e., v. It first searches for the pebble, then simulates A, starting in state q1. When it gets back to the pebble in state q2, A¢'s mission is accomplished.
Q¢
=
Q È{qin, qfin}
I¢
=
{ qin}
F¢
=
{ qfin}
d¢
=
{ (qin, ¯i, qin) \mid i Î rki(S) } È{ (qin, lab(s,1), q1) \mid s Î S} È
{ (p,d,q) \mid (p,d,q) Î d, d \not Î {here, Øhere, Øheregr, putgr }} È
{ (p,lab(s,1),q) \mid (p,labs,q) Î d} È
{ (p,lab(s,1),q) \mid (p,here,q) Î d,s Î S} } È
{ (p,labs,q) \mid (p,Øhere,q) Î d,s Î S} È
{ (p,root,q) \mid (p,heregr,q) Î d,s Î S} È
{ (p,(Øroot, Øheregr),q) \mid (p,Øheregr,q) Î d} È
{ (p,(Øroot, putgr),q) \mid (p,putgr,q) Î d} È
{ (q2, lab(s,1), qfin) \mids Î S}
With this definition of A¢, we have, for all t Î TS, u,v Î Vt, k ³ 2, p Î \mathbb N ® MPt with p(k) = u and p(k-1) = (v,g), with u Î des(v) and the only marble on tv at v,
(q1,u,k,p) \twoheadrightarrowA,t* (q2,u,k,p) iff mark(tv,u) Î L(A¢).
Because TWMA=REGT (Theorem 19), there exists a finite tree automaton M over SÈ(S×B1) with L(M) = L(A¢). The automaton M complies with the demands of the Claim. This ends the proof of the Claim.

We continue with the proof of Theorem 20. We construct a tree-walking marble automaton A¢ that recognizes the same tree language as the tree-walking marble/pebble automaton A. The automaton A¢ = (Q¢,DTWMA(S,G¢),d¢,I¢,F¢) simulates A, but at each step of A, A¢ is prepared to simulate the use of the pebble of A, by knowing, if it were to put the pebble at the current node u, the states in which A could return at the same node, ready to pick up the pebble again (cf. the proof of Theorem 11). To accomplish this, A¢ uses a special set of marbles in addition to the marbles of A. For each q,q¢ Î Q and g Î G, Mq q¢g is the automaton that can be constructed according to the Claim, after removing all transitions with directives put and lift from A. The meaning of the additional marble colours is as follows.

A¢ uses a method similar to the one used in Lemma 16 to compute the states of the Mq q¢g and thus the marble colours for the marbles from the set ``states''. The successful states of the Mq q¢g for the children of a node u are computed by A¢ from the successful states of u itself and from the states that Mq q¢g reaches in the children of u. When A puts down its pebble (at node u), A¢ checks whether Mq q¢g accepts mark(tv,u), where q is the state of A after it put down its pebble, q¢ is a state from which A can pick it up again, g is the colour of the last marble that A put down before the pebble and v is the position of that marble. The colour g is stored in the marble [^(g)] at the current node u. During the direct simulation of the steps of A, there is always exactly one marble from the set {[^(g)] \mid g Î G} È{e} at the current node. When there is at least one marble of A on the simulated tree, there is also exactly one marble from each of the sets states and succ at the current node. The colours of these three marbles determine whether or not the Mq q¢g accept mark(tv,u). We define the abbreviation ØhereQ to be Øhereq1, ¼, Øhereqn with {q1, ¼, qn} = Q.

Formally, the states, marble alphabet, initial and final states, and transitions of A¢ are defined as follows.

Q¢
=
Q È{ qin, qfin} È{ findstatesg \mid g Î G} È{ (f,g) \mid f:(Q ×Q) ® Q, g Î G}
G¢
=
GÈQ È{ ^
g
 
\mid g Î G} È{e}ÈstatesÈsucc
states
=
{ l} È
È
i Î rki(S) 
((Q ×Q) ® Q)i
succ
=
(Q ×Q) ® 2Q
I¢
=
{ qin}
F¢
=
{ qfin}
d¢
=
din Èdsim Èdstates Èdfin
The set of transitions d¢ of A¢ is divided into four parts. The initialization of A¢ is done in the transitions in din. The actual simulation of the steps of A is done by the transitions in dsim. The set dstates contains a ``subroutine'' for calculating the states of Mq1 q2 g for the children of the current node in a marble from the set states. Finally, the cleaning up is done by the transitions in dfin.

We start with describing din. From the initial state qin of A¢, a marble e is laid on the root. This marble indicates that there is no marble on the (simulated) tree yet. This transition prepares the automaton A¢ for the simulation of A.

din = { (qin, pute, q) \mid q Î I }
In dsim we place the transitions for the actual simulation of the steps of A. This is quite technical. After simulating any of the directives {¯i, putg, liftg} the marble from states must be re-computed. The marbles from succ are computed with the use of a function ``su'' that is defined after dsim. In the last set of transitions, the putting down and lifting of the pebble is simulated, by making use of the information in the marbles from states and succ.
dsim
=
{ (p, (heree,¯i,pute, q) \mid(p, ¯i, q) Î d} È
{ (p, (labs, here[^(g)], heref1¼fk, hereg, ¯i, put[^(g)], putg¢, putq, putl), findstatesg) \mid (p, ¯i, q) Î d,
s Î S, k = rk(s), g Î G, i Î [1,k], fj:Q×Q ® Q  (j Î [1,k]),
g:Q×Q ® 2Q,g¢ = su(g, s, g, i, f1, ¼, fk) } È
{ (p, (lifte, ­i), q) \mid(p, ­i, q) Î d} È
{ (p, (ØhereG, lift[^(g)], liftg, liftf1¼fk, ­i), q) \mid (p, ­i, q) Î d,
g Î G, g:Q×Q ® 2Q, k ³ 0,fj:Q×Q ® Q  (j Î [1,k]) } È
{ (p, labs, q) \mid (p, labs, q) Î d} È
{ (p, (putg, lifte, put[^(g)],putg, putq, putl), findstatesg) \mid (p, putg, q) Î d,
g:Q×Q ® 2Q with g(q1,q2) = FMq1 q2 g } È
{ (p, (putg, lift[^(g)]¢, put[^(g)],liftf1 ¼fk, liftg,putg¢, putq, putl), findstatesg) \mid (p, putg, q) Î d,
g¢ Î G, k ³ 0, fj:Q×Q ® Q  (j Î [1,k]),g:Q×Q ® 2Q,
g¢:Q×Q ® 2Q with g¢(q1,q2) = FMq1 q2 g } È
{ (p, (liftg, lift[^(g)], liftf1 ¼fk,liftg, ­i, heree, ¯i,pute), q)\mid (p, liftg, q) Î d,
k ³ 0, fj:Q×Q ® Q  (j Î [1,k]),g:Q×Q ® 2Q, i Î rki(S) } È
{ (p, (liftg, lift[^(g)], liftf1 ¼fk,liftg, ­i, here[^(g)]¢, labs, heref¢¢1 ¼f¢¢k¢¢, hereg¢¢,¯i,
put[^(g)]¢, putg¢, putq, putl), findstatesg¢) \mid (p, liftg, q) Î d, k ³ 0,
fj:Q×Q ® Q  (j Î [1,k]),g:Q×Q ® 2Q, i Î rki(S), g¢ Î G,s Î S,
k¢¢ = rk(s),f¢¢j:Q×Q ® Q  (j Î [1,k¢¢]), g¢¢:Q×Q ® 2Q,
g¢:Q×Q ® 2Q, g¢ = su(g¢¢,s,g¢,i,f¢¢1, ¼, f¢¢k¢¢) } È
{ (p, (root, liftg, lift[^(g)], liftf1 ¼fk,liftg, pute), q) \mid (p, liftg, q) Î d,
k ³ 0, fj:Q×Q ® Q  (j Î [1,k]),g:Q ×Q ® 2Q } È
{ (p, hereg, q) \mid (p, hereg, q) Î d} È
{ (p, Øhereg, q) \mid (p, Øhereg, q) Î d} È
{ (p, (labs, heref1 ¼fk, hereg, here[^(g)]), q) \mid (p, put, p¢) Î d, (q¢, lift, q) Î d,
s Î S, k = rk(s), g Î G,
g:Q×Q ® 2Q,fj:Q×Q ® Q  (j Î [1,k]),
(d(s,1))Mp¢q¢g(f1(p¢, q¢), ¼, fk(p¢, q¢)) Î g(p¢, q¢) }
Here su(g, s, g, i, f1, ¼, fk) computes the next set of successful states. For all states q1, q2 Î Q, su(g, s, g, i, f1, ¼, fk)(q1,q2) is the set of all states q such that Mq1 q2 g is successful, assuming it reaches the ith child of the current node in state q, and the jth (j ¹ i) child of the current node in state fj(q1, q2), or
su(g, s, g, i, f1, ¼, fk)(q1,q2)
=
{q Î QMq1 q2 g \mid (ds)Mq1 q2 g(f1(q1, q2), ¼, fi-1(q1, q2), q,
fi+1(q1, q2), ¼, fk(q1, q2)) Î g(q1, q2) } .

The ``subroutine'' to construct the marble from states is as follows: dstates = Èg Î G dg with

dg
=
{ (findstatesg, (ØhereQ, labs, liftl, ­i), (f,g)) \midi Î rki(S), rk(s) = 0,
f:(Q ×Q) ® Q with f(q1,q2) = (ds)Mq1 q2 g } È
{ (findstatesg, (liftq, labs), q) \midrk(s) = 0, q Î Q }
{ (findstatesg, (labs, ¯1, putl), findstatesg) \midrk(s) > 0 } È
{ ((fk,g), (labs, liftf1¼fk-1, putf1¼fk, ¯k+1, putl), findstatesg) \mid1 £ k < rk(s) } È
{ ((fk,g), (ØhereQ, labs, liftf1¼fk-1, ­i), (f,g)) \mid k = rk(s), i Î rki(S),
f(q1, q2) = (ds)Mq1 q2 g(f1(q1, q2), ¼,fk(q1, q2)) } È
{ ((fk,g), (liftq, labs, liftf1¼fk-1, putf1¼fk), q) \midq Î Q, k = rk(s) ³ 1}
Finally, to allow A¢ to accept the tree, A¢ has to remove its auxiliary pebbles from the tree. Since at a moment that A would accept the tree there are no marbles on it, A¢ only has to remove the marbles e.
dfin = { (q, lifte, qfin) \mid q Î F } È{ (qfin, (­i, lifte), qfin) \mid i Î rki(S) }
\square

3.7  Tree-Walking Automata with MSO Tests

A completely different extension (i.e., not involving marbles or pebbles) of tree-walking automata is to add the capability to perform MSO tests (see [BE97], Section 1.4). Let S be a ranked alphabet. The set of directives for a tree-walking automaton with MSO tests is
DTWA+M(S) = { ­i, ¯i \mid i Î rki(S) } ÈMSOL1(S) .
As this is an infinite set, we have to choose a finite subset of DTWA+M(S) for each tree-walking automaton with MSO tests. For each tree t Î TS and each directive d Î DTWA+M(S) we define the following binary relation on Vt:
Rt(­i)
=
{ (u,v) \mid (v,i,u) Î Et }
Rt(¯i)
=
{ (u,v) \mid (u,i,v) Î Et }
Rt(y(x))
=
{ (u,u) \mid (t,u) \models y(x) }
A tree-walking automaton with MSO tests over S is a finite automaton A over a finite subset of DTWA+M(S). The definitions of configurations, \twoheadrightarrowA,t, Rt(A), R(A) and L(A) are unchanged. The definition of deterministic tree-walking automata with MSO tests is also unchanged with respect to the definition of deterministic tree-walking automata, although it is no longer possible to list all pairs of mutually exclusive directives. However, it is decidable whether a given pair of directives is mutually exclusive. Directives of the form (q,s,q¢) with s Î DTWA+M(S)* are treated in the usual way. The class of all tree languages L such that L = L(A) for some tree-walking automaton with MSO tests A is named TWA+M. As stated in Section 1.4, in [BE97] the following is proven.

Proposition 21 The following three statements hold:

3.8  Equivalence of Binary MSO Formulas and
Tree-Walking Marble/Pebble Automata

In this section we will show that binary MSO formulas, i.e., formulas f(x,y) Î MSOL2(S) for some ranked alphabet S, define the same node relations as those computed by tree-walking marble/pebble automata. First we will show that for each binary formula f(x,y) Î MSOL2(S) there exists a tree-walking marble/pebble automaton A over (S, G) (for some alphabet G) such that they compute the same relation, that is, R(f) = R(A). We will make use of the second part of Proposition 21 to construct our tree-walking marble/pebble automaton, similar to the alternate proof of Theorem 7 (Section 1.6). Note that the original proof of Theorem 7 is not valid for trees, because, if this proof were translated to trees, the pebble and marbles would not be used nested. We need the following lemma. This lemma states that it is possible to simulate a unary MSO formula with a tree-walking marble/pebble automaton that makes a round-trip, checking the absence of all marbles and pebbles before entering a final state.

Lemma 22 Let S be a ranked alphabet. For each unary MSO formula y(x) Î MSOL1(S) there exists a tree-walking marble/pebble automaton A = (Q,DTWMPA(S,G),d,I,F) over (S, G), for some alphabet G, such that, for all t Î TS, Rt(A) = { (u,u) \mid u Î Vt, (t,u) \models y(x) } and, if (qin,u,0,p0) \twoheadrightarrowA,t* (qfin,u¢,k,p) for some u,u¢ Î Vt, qin Î I and qfin Î F, then k = 0 and p = p0.

Proof. Let S be a ranked alphabet and let y(x) Î MSOL1(S) be a unary MSO formula. According to Corollary 3, there exists a closed MSO formula y¢ Î MSOL0(SÈ(S×B1)) such that, for all t Î TS and u Î Vt, (t,u) \models y(x) iff mark(t,u) \models y¢. Since MSOT=REGT (Proposition 15) and TWMA=REGT (Theorem 19), there exists a tree-walking marble automaton

A¢ = (Q¢,DTWMA(SÈ(S×B1),G),d¢,I¢,F¢)
over (SÈ(S×B1), G) (for some alphabet G) such that L(A¢) = L(y¢). This implies that for all t Î TS, u Î Vt and t¢ = mark(t,u),
t¢\models y¢ iff (qin,roott¢,m0) \twoheadrightarrowA¢,t¢* (qfin,v,m0) for some qin Î I¢, qfin Î F¢ and v Î Vt .
We now define the tree-walking marble/pebble automaton A = (Q,DTWMPA(S,G),d,I,F). This automaton first drops its pebble in order to remember the start of the walk. Then it walks to the root of the tree and starts simulating A¢. It uses the pebble to determine when to interpret a node label s Î S as s (when there is no pebble at the current node) and when to interpret s as (s,1) (when there is a pebble at the current node). When it reaches a final state, it walks to the root of the tree, checking on the way that there are no marbles (this can be done since the marbles always lie on the path from the root to the current node), so that A¢ really accepts the marked tree. Then A searches for the pebble, picks it up and finishes.
Q
=
Q¢È{ qin, toroot, ready, findpebble, qfin}
I
=
{ qin}
F
=
{ qfin}
d
=
{ (qin,put,toroot) } È{ (toroot, ­i, toroot) \mid i Î rki(S) } È
{ (toroot, root, q) \mid q Î I¢} È
{ (q,d,q¢) \mid (q,d,q¢) Î d¢, d ¹ labs for all s Î S } È
{ (q,(Øhere,labs),q¢) \mid (q,labs,q¢) Î d¢} È
{ (q,(here,labs),q¢) \mid(q,lab(s,1),q¢) Î d¢} È
{ (q, ØhereG, ready) \mid q Î F¢} È
{ (ready, (­i, ØhereG), ready) \mid i Î rki(S) } È
{ (ready, (root, ØhereG), findpebble) } È
{ (findpebble, (Øhere, ¯i), findpebble) \mid i Î rki(S) } È
{ (findpebble, (here, lift), qfin) }
Note that the construction of A ensures that the marbles and pebble are used nested, because first the pebble is put down, then the marbles are used in the same way as A¢ uses them (i.e., nested), then A makes sure there are no marbles left on the tree and finally A picks up the pebble.

From the construction of A it follows that, for all t Î TS, u Î Vt and t¢ = mark(t,u),

(q,roott¢,m0) \twoheadrightarrowA¢,t¢* (q¢,v,m0) for some q Î I¢, q¢ Î F¢ and v Î Vt iff
(qin,u,0,p0) \twoheadrightarrowA,t* (qfin,u,0,p0).
This, and the details of the construction of A, implies Rt(A) = { (u,u) \mid u Î Vt, (t,u) \models y(x) } and the other details of the lemma. \square

Lemma 23 Let S be a ranked alphabet and let f(x,y) Î MSOL2(S) be a binary MSO formula. Then there exists a tree-walking marble/pebble automaton A such that R(A) = R(f).

Proof. According to Proposition 21, there exists a tree-walking automaton with MSO tests A¢ = (Q¢,D, d¢,I¢,F¢) such that R(A¢) = R(f). We define a tree-walking marble/pebble automaton A over (S,G) that simulates A¢. The automaton A can directly simulate the directives {­i, ¯i \mid i Î rki(S)}. The directives y(x) Î MSOL1(S) are handled by constructing a tree-walking marble/pebble automaton according to Lemma 22 and using this automaton as a ``subroutine'' in A. We define T = {(q,d,q¢) Î d¢\mid d Î MSOL1(S)}, the set of all transitions with MSO tests in A¢. For each transition t = (q,y(x),q¢) Î T, we use Lemma 22 to construct the tree-walking marble/pebble automaton At over (S, Gt) that simulates t. We will assume that the states of any automaton At do not overlap with the states of any other At or with the states of A¢. We also assume that the marble alphabets Gt are disjoint, or Gt ÇGt¢ = Æ for all t,t¢ Î T, t ¹ t¢. The marble alphabet of A is G = Èt Î T Gt.

We define A = (Q,DTWMPA(S,G),d,I,F) as follows:

Q
=
Q¢È
È
t Î T 
QAt
I
=
I¢
F
=
F¢
d
=
{ t Î d¢\mid t\not Î T } È
{ (q, (put,lift), q¢), (q¢¢, (put,lift), q¢¢¢) \midq¢ Î IAt, q¢¢ Î FAt,t = (q,y(x),q¢¢¢) Î T } È

È
t Î T 
dAt
Here the (double) directive (put,lift) acts as a ``nop'' directive. It can easily be seen that A complies with the demands of nesting and that R(A) = R(A¢). \square

We now show the other way around.

Lemma 24 Let S be a ranked alphabet and let A be a tree-walking marble/pebble automaton over (S,G). Then there exists a binary MSO formula f(x,y) Î MSOL2(S) such that R(f) = R(A).

Proof. From the automaton A we first construct another tree-walking marble/pebble automaton A¢ over (SÈ(S×B2), G) such that, for all t Î TS and u,v Î Vt,

mark(t,u,v) Î L(A¢) iff (t,u,v) Î R(A) .
A¢ first finds the node with label (s,1,b), simulates A and accepts the tree if A has a final state in a node with label (s,b,1). Because TWMPA=REGT (Theorem 20) and REGT=MSOT (Proposition 15), there exists a MSO formula y Î MSOL0(SÈ(S×B2)) such that, for all t¢ Î TSÈ(S×B2),
t¢ Î L(y) iff t¢ Î L(A¢) .
Now, using Corollary 3, we obtain a binary MSO formula f(x,y) Î MSOL2(S) such that, for all t Î TS and u,v Î Vt,
(t,u,v) Î R(f) iff mark(t,u,v) Î L(y)
and we have, for all t Î TS and u,v Î Vt,
(t,u,v) Î R(f) iff (t,u,v) Î R(A) .
\square

These two lemmas together prove the grand finale of this chapter.

Theorem 25 The following two statements hold:

This theorem states that binary MSO formulas define the same node relations that tree-walking marble/pebble automata compute. Since the proof of this theorem, and all underlying theorems and lemmas, is constructive, we can actually construct an operational automaton with only local operations to compute the node relation that is defined by a given, descriptive, binary MSO formula, and vice versa.

Conclusion and Recommendations

Conclusion

In this paper we have described a number of string-walking and tree-walking automata and we have proved some of their properties, especially with regard to the binary node relations these automata compute. This study has been done in order to find an answer to the following question:

Is it possible to define a type of tree-walking automaton, with only local operations, that computes the same binary node relations that binary MSO formulas recognize?

In Chapter 1 we have investigated the possibilities to define a string-walking automaton that computes the same binary relations on the positions of strings as binary MSO formulas. It was found that string-walking pebble automata have this property. It is even possible, for every binary MSO formula, to construct a string-walking pebble automaton that computes the same binary relation. The reverse is also true: for each string-walking pebble automaton, it is possible to construct a binary MSO formula that defines the same binary relation. It was also shown that the pebble is necessary. There is a binary MSO formula for which there is no string-walking automaton (without pebble) that computes the same binary relation. Since the pebble operations are local (the pebble can only be handled on the current position of the automaton), this answers our central question for strings.

To answer our central question for trees, we have defined in Chapter 3, among others, tree-walking pebble/marble automata. It has been shown that these automata compute exactly the binary node relations that can be defined by binary MSO formulas. Like with string-walking pebble automata, the appropriate automaton for each binary MSO formula can be constructed, and vice versa. Tree-walking marble/pebble automata have also only local operations. All pebble-handling and marble-handling occurs only on the current node. A note must be made that it may seem a bit complicated to have both marbles and a pebble available to the automaton. We have made no study to check whether it is possible to leave out, e.g., the pebble and still compute the same binary node relations. Also one might argue that the tree-walking marble/pebble automaton is not strictly local, since the automaton ``remembers'' in some way that it has dropped the pebble, further down the tree, when it is about to pick up the marble that was put down last before the pebble.

Recommendations

There are a number of questions that have not been answered in this paper, but may be interesting for future study.

Bibliography

[BE97]
R. Bloem and J. Engelfriet. Characterization of properties and relations defined in monadic second order logic on the nodes of trees. Technical Report 97-04, Leiden University, 1997.

[BH65]
M. Blum and C. Hewitt. Automata on a 2-dimensional tape. In Proc. 8th IEEE Symp. on Switching and Automata Theory, pages 155-160, 1965.

[Büc60]
J. Büchi. Weak second-order arithmetic and finite automata. Z. Math. Logik Grundl. Math., 6:66-92, 1960.

[Don70]
J. Doner. Tree acceptors and some of their applications. J. of Comp. Syst. Sci., 4:406-451, 1970.

[GS84]
F. Gécseg and M. Steinby. Tree Automata. Akadémiai Kiadó, Budapest, 1984.

[HU79]
J.E. Hopcroft and J.D. Ullman. Introduction to Automata Theory, Lamnguages, and Computation. Addison-Wesley, Reading, Mass., 1979.

[KS81]
T. Kamimura and G. Slutzki. Parallel and two-way automata on directed ordered acyclic graphs. Inform. Contr., 49:10-51, 1981.

[RHE91]
G. Rozenberg, H.J. Hoogeboom, and J. Engelfriet. Formele talen en automaten 1. Afdeling Wiskunde en Informatica, Rijksuniversiteit Leiden, 1991. In Dutch.

[TW68]
J.W. Thatcher and J.B. Wright. Generalized finite automata theory with an application to a decision problem of second-order logic. Math. Systems Theory, 2:57-81, 1968.


File translated from TEX by TTH, version 2.32.
On 9 Jul 1999, 16:00.

Back to publications index