Tree-Walking Automata and Monadic Second Order Logic

Jan-Pascal van Best

Preface

This report describes the results of a study in the field of theoretical computer science. It was written as my ``doctoraalscriptie'' (Master's thesis) for my study of Computer Science at Leiden University. The graduation tutor from the Department of Computer Science was J. Engelfriet, PhD.

This report was written using the L^AT_EX typesetting software on a computer running Linux and the KDE desktop environment.

I would like to thank Joost Engelfriet for his constant advice and his help in conducting this study. My parents have always supported me in many ways. And, last but certainly not least, thanks to Ellen for the title page she designed and for her love and support.

Jan-Pascal

Delft, July 1998

Preface
Introduction
1 Binary MSO Formulas and String-Walking Pebble Automata
    1.1 Preliminaries
    1.2 String-Walking Automata
    1.3 String-Walking Pebble Automata
    1.4 String-Walking Automata with MSO Tests
    1.5 From MSO Formulas to String-Walking Pebble Automata
    1.6 Another proof
    1.7 Pebble Necessity
    1.8 From String-Walking Pebble Automata to MSO Formulas
2 Extensions to String-Walking Pebble Automata
    2.1 String-Walking Multi-Pebble Automata
    2.2 From String-Walking Multi-Pebble Automata to MSO Formulas
    2.3 String-Walking Marble Automata
3 Binary MSO Formulas and Tree-Walking Automata
    3.1 Preliminaries
    3.2 Tree-Walking Automata
    3.3 Push-Down Tree-Walking Automata
    3.4 Equivalence of Finite Tree Automata and Push-Down Tree-Walking Automata
    3.5 Tree-Walking Marble Automata
    3.6 Tree-Walking Marble/Pebble Automata
    3.7 Tree-Walking Automata with MSO Tests
    3.8 Equivalence of Binary MSO Formulas and Tree-Walking Marble/Pebble Automata
Conclusion and Recommendations

Introduction

Monadic second order logic (MSO logic, [Büc60,,TW68]) is a way to describe properties of graphs. In this paper we will focus on strings and trees as special cases. Monadic second order logic combines great strength with great ease of expression. Closed MSO formulas define sets of trees, while MSO formulas with one or more unbound variables define node relations on trees. One perhaps less desirable property of MSO logic is that its formulas are global in a way that they define properties of trees as a whole. Another way to define node relations is by tree-walking automata. A tree-walking automaton can compute a binary node relation by starting on one node of the tree and finishing on another node. It has already been shown that tree-walking automata that can test unary MSO formulas at each node they visit, compute the same node relations that binary MSO formulas define [BE97]. However, these unary MSO formulas define in a way also global properties of the tree, in that they can test, e.g., the labels of several nodes of the tree they walk on in one formula. It can also be shown that ordinary tree-walking automata do not compute the same node relations that binary MSO formulas recognize. Therefore, in order to compute the node relations defined by binary MSO formulas, we need to extend tree-walking automata with features that make them more powerful. We need to make sure that these features result in only local operations, i.e., the resulting automata can, during their walk on a tree, only test properties of the current node. Also, we would like these operations to be more operational and less descriptive than unary MSO formulas. These considerations lead us to the central question of this paper.

Is it possible to define a type of tree-walking automaton, with only local operations, that computes the same binary node relations that binary MSO formulas recognize?

In this paper, we try to find an answer to this question. We start by attempting to find an answer for strings in Chapter 1. In this chapter, we introduce the concept of pebbles . During the walk of a string- or tree-walking automaton, the automaton can place a pebble on the current node. Later during its walk, the automaton can check for the presence of a pebble and pick it up. Variations are possible in the allowed number of pebbles, coloured pebbles and restriction in the use of pebbles. It is shown that string-walking pebble automata (with just one pebble) compute exactly the binary relations that binary MSO formulas define on strings. In the next chapter, we define some more powerful string-walking automata and show some of their properties. In Chapter 3, we move on to trees. The task of finding a proper extension for tree-walking automata proves to be a bit harder than for string-walking automata. Because it is already known [KS81] that push-down tree-walking automata recognize exactly the regular tree languages (which are the tree languages defined by closed MSO formulas), we develop tree-walking marble automata, similar in concept to push-down tree-walking automata, that also recognize exactly the regular tree languages but, unlike push-down tree-walking automata, allow for a natural definition of the computation of node relations. By adding one pebble, we obtain tree-walking marble/pebble automata. We show that these compute exactly the binary node relations that binary MSO formulas define on trees. In the last chapter, we conclude this paper and give some recommendations for future research.

Chapter 1
Binary MSO Formulas and String-Walking Pebble Automata

1.1 Preliminaries

In this section we recall some well-known concepts from formal language theory [HU79,], more specifically concerning strings, finite automata and monadic second order logic (on strings). Before doing so, we recall some standard terminology from set theory.

The set of natural numbers is \mathbb N = {0,1,2,¼} and, for m,n Î \mathbb N, [m,n] = { i Î \mathbb N \mid m £ i £ n}. For a set S, 2^S is its powerset, i.e., the set of all subsets of S. Let R Í A ×B be a binary relation. The transitive reflexive closure of R is denoted R^*. The inverse of R is R^-1 = { (y,x) \mid (x,y) Î R }. For two binary relations R₁ and R₂, their composition is R₁ °R₂ = {(x,z) \mid $y: (x,y) Î R₁ and (y,z) Î R₂ }. Note that the order of R₁ and R₂ is nonstandard. For each a Î A, R(a) = { b Î B \mid (a,b) Î R } and for each S Í A, R(S) = È{ R(a) \mid a Î S }. The domain of R is dom(R) = R^-1(B). A binary relation R is functional if it is a partial function, i.e., (x,y),(x,z) Î R implies x = z.

Let f:A® B be a function; for any a and b, f(a® b) denotes the ``perturbed'' function f¢:A È{a} ® B È{b} with f¢(a) = b and f¢(a¢) = f(a¢) for every a¢ Î A with a¢ ¹ a.

Strings

An alphabet is a finite set of symbols. Let S be an alphabet. A string over S is a finite string of symbols from S. The empty string is denoted l. The set of all strings over S is denoted S^*. A subset of S^* is called a language. If w is a string over S, |w| denotes the length of w. The set of positions in w is denoted by V_w = [1,|w|]. For a string w Î S^* and i Î V_w, we write lab_w(i) = s if the ith symbol of w is s. For k Î \mathbb N, a k-ary position relation over S is a subset of {(w,u₁,¼,u_k) \mid w Î S^* and u_i Î V_w for all i Î [1,k] }. A k-ary position relation associates with each string w a k-ary relation on the positions of w.

We define marked strings as follows. Let S be an alphabet, and k ³ 1. We define B_k = {0,1}^k \{0}^k. The alphabet SÈ(S×B_k) contains all symbols s Î S and the symbols (s, b₁, ¼, b_k) with b_i Î {0,1} for all i Î [1,k], but without (s, 0, ¼, 0). The role of the latter symbol is played by s itself. This alphabet is used to attach k different marks to the positions in a string. Let w Î S^* be a string over S, and let u₁, ¼, u_k Î V_w. The marked string w¢ = mark(w, u₁, ¼, u_k) Î (SÈ(S×B_k))^* is the string with lab_w¢(u) = lab_w(u) if u ¹ u_i for all i Î [1,k], and lab_w¢(u) = (lab_w(u), (u = u₁), ¼, (u = u_k)) otherwise (where (u = u_i) = 1 iff u equals u_i).

Finite Automata

Let S be an alphabet. A finite automaton over S is a quintuple M = (Q,S,d,I,F), where Q is a finite set of states, S is the input alphabet, d Í Q ×S×Q is the transition relation, I Í Q is the set of initial states and F Í Q is the set of final states. The elements of d are called transitions. For every string w Î S^*, the state transition relation R_M(w) Í Q ×Q is defined as follows. For s Î S, R_M(s) = { (p,q) \mid (p,s,q) Î d}. For s₁,¼,s_n Î S, R_M(s₁¼s_n) = R_M(s₁)°¼°R_M(s_n). The language recognized by M is L(M) = { w Î S^* \mid (q_in,q_fin) Î R_M(w) for some q_in Î I, q_fin Î F }. A language is regular if it is recognized by a finite automaton. The class of all regular languages is named REG.

Monadic Second Order Logic (on strings)

Monadic second order logic can be used to describe properties of strings [Büc60]. For an alphabet S, we define the language MSOL(S) of MSO formulas over S. This language has position variables x,y,¼ and position-set variables X,Y,¼. For a given string w Î S^*, position variables range over V_w = [1,|w|] and position-set variables range over the subsets of V_w.

There are three types of atomic formulas in MSOL(S): lab_s(x), for every s Î S, signifies that at position x there is a symbol s. When MSO logic is expanded to describe properties of trees and graphs, lab_s(x) means that node x is labeled s, hence the name lab_s. The atomic formula pre(x,y) signifies that position x comes directly before y, i.e. y = x+1; and x Î X signifies that x is an element of X. The connectives used to build formulas from these atomic formulas are Ø, Ù, Ú, ®, «, with the usual meaning. Both position variables and position-set variables can be quantified with $ and ". We will make use of the following abbreviations of MSO formulas:

head(x)

"y(Øpre(y,x))

tail(x)

"y(Øpre(x,y))

true(x)

$X(x Î X)

false(x)

Øtrue(x)

If an MSO formula f has free variables, say x, y, X and no others, we write f(x,y,X) to indicate the free variables of f. For every k Î \mathbb N, the set of MSO formulas over S with k free position variables and no free position-set variables is denoted MSOL_k(S), or the set of k-ary MSO formulas. For a closed formula f Î MSOL₀(S) and a string w Î S^*, we write w \models f if w satisfies f. The language defined by f is L(f) = { w Î S^* \mid w \models f}. L(f) is called an MSO definable language . The class of all MSO definable languages is named MSO.

Given a string w, a valuation function n is a function that maps each position variable to a position u Î V_w and each position-set variable to a subset U of V_w. Let f be an MSO formula. We write (w,n) \models f, if f holds in w, where the free variables of f are assigned values according to the valuation function n. If f has free variables x,y,X, we write (w,u,v,U) \models f(x,y,X) for (w,n) \models f, where n(x) = u, n(y) = v and n(X) = U.

Let f(x₁, ¼, x_k) Î MSOL_k(S) be an MSO formula with k free positions variables (and no free position-set variables). For each string w Î S^*, the k-ary relation that f defines on the positions of w is

R_w(f) = { (u₁, ¼, u_k) Î V_w^k \mid (w, u₁, ¼, u_k) \models f(x₁, ¼, x_k) } .

The MSO formula f(x₁, ¼, x_k) defines the position relation

R(f) = { (w, u₁, ¼, u_k) \mid w Î S^*, (u₁, ¼, u_k) Î R_w(f) } .

Büchi [Büc60] proved the following classical result.

Proposition 1 A language is MSO definable if and only if it is regular.

In [BE97], the following (well-known) results are proven for MSO formulas on trees. They also hold for the special case of strings. Corollary 3 is the result of Lemma 2 in the case j = k.

Lemma 2 Let S be an alphabet, k ³ 1, and j Î [1,k]. For every formula f(x₁, ¼, x_k) Î MSOL_k(S) there is a formula y(x_j+1, ¼, x_k) Î MSOL_k-j(SÈ(S×B_j)) such that, for all w Î S^* and u₁, ¼, u_k Î V_w,

(w, u₁, ¼, u_k) \models f(x₁, ¼, x_k) iff (mark(w, u₁, ¼, u_j), u_j+1, ¼, u_k) \modelsy(x_j+1, ¼, x_k),

and vice versa, i.e., for every formula y(x_j+1, ¼, x_k) Î MSOL_k-j(SÈ(S×B_j)) there is a formula f(x₁, ¼, x_k) Î MSOL_k(S) such that the above equivalence holds for all w Î S^* and u₁, ¼, u_k Î V_w.

Corollary 3 Let S be an alphabet and k ³ 1. For every formula f(x₁, ¼, x_k) Î MSOL_k(S) there is a formula y Î MSOL₀(SÈ(S×B_k)) such that, for all w Î S^* and u₁, ¼, u_k Î V_w,

(w, u₁, ¼, u_k) \models f(x₁, ¼, x_k) iff mark(w, u₁, ¼, u_k) \models y,

and vice versa, i.e., for every MSO formula y Î MSOL₀(SÈ(S×B_k)) there is a formula f(x₁, ¼, x_k) Î MSOL_k(S) such that the above equivalence holds for all w Î S^* and u₁, ¼, u_k Î V_w.

From Corollary 3 and Proposition 1 follows the following lemma.

Lemma 4 Let S be an alphabet and let k ³ 1. For every MSO formula f(x₁, ¼, x_k) Î MSOL_k(S) there exists a finite automaton M over SÈ(S×B_k) such that, for all w Î S^* and u₁, ¼, u_k Î V_w,

(w, u₁, ¼, u_k) \models f(x₁, ¼, x_k) iff mark(w, u₁, ¼, u_k) Î L(M) .

1.2 String-Walking Automata

String-walking automata are known in the literature as two-way finite automata (see [HU79]). We will call them string-walking automata to stress both the differences and the similarities with tree-walking automata. String-walking automata are like the well-known finite automata, except that they can walk both forwards and backwards along the string they are examining (hence the name two-way finite automata). At any moment in the execution of a string-walking automaton, the automaton is in one state and at one position of the string. Let S be an alphabet. Syntactically, a string-walking automaton over S is a finite automaton with a special set of directives as input alphabet. We define the set of directives as

D_SWA(S) = { ¬, ®, head, Øhead, tail, Øtail} È{ lab_s\mid s Î S} .

The meaning of the directives is as follows:

¬ means go back to the previous position in the input string.
® means go to the next position in the input string.
head checks whether the current position is the first position in the string.
tail checks whether the current position is the last position in the string.
Øhead and Øtail are the negations of head and tail respectively.
lab_s checks whether the current position is labeled with s.

Formally, we define for each string w Î S^* and each directive d Î D_SWA(S) the following binary relation R_w(d) on V_w:

R_w(¬)

{ (i, i-1) \mid 1 < i £ |w| }

R_w(®)

{ (i, i+1) \mid 1 £ i < |w| }

R_w(lab_s)

{ (i, i) \mid 1 £ i £ |w| and the ith symbol of w is s }

R_w(head)

{ (1, 1) }

R_w(Øhead)

{ (i, i) \mid 1 < i £ |w| }

R_w(tail)

{ (n, n) \mid n = |w| }

R_w(Øtail)

{ (i, i) \mid1 £ i < |w| }

A string-walking automaton over S is a finite automaton A over D_SWA(S). For a string-walking automaton A = (Q,D_SWA(S), d, I, F) and a string w Î S^*, an element (q,u) of Q ×V_w is a configuration of the automaton. The configuration (q,u) Î Q ×V_w signifies that A is in state q at position u. We say that the current position of the automaton on w is u. The set of all configurations of A and w is denoted by \mathbb C_A,w.

Let w Î S^*. One step of A = (Q,D_SWA(S),d,I,F) on w is defined by the binary relation \twoheadrightarrow_A,w on the set of configurations, as follows. For every (q,u), (q¢,u¢) Î \mathbb C_A,w,

(q,u) \twoheadrightarrow_A,w (q¢,u¢) iff $d Î D_SWA(S): (q,d,q¢) Î d and (u,u¢) Î R_w(d) .

For each string w Î S^*, A computes the binary relation

R_w(A) = { (u,v) Î V_w ×V_w \mid (q,u) \twoheadrightarrow_A,w^*(q¢,v) for some q Î I and q¢ Î F } .

Thus, A computes the position relation

R(A) = { (w,u,v) \mid w Î S^*, (u,v) Î R_w(A) } .

We define the language that A recognizes as all strings w with the property that there is a walk of A on w starting at the beginning of the string in an initial state and ending in a final state, or

L(A) = { w Î S^* \mid there exists v Î V_w such that (1,v) Î R_w(A)} .

The class of all languages recognized by a string-walking automaton is named SWA.

In the following we will also allow transitions of the form (p,s,q) with s Î D_SWA(S)^*. If s = d₁ d₂ ¼d_n, we define, for all w Î S^*,

R_w(s) = R_w(d₁) °¼°R_w(d_n).

These extended directives can easily be implemented by adding extra transitions and states.

A string-walking automaton A = (Q,D_SWA(S),d,I,F) over S is deterministic if the following three conditions hold:

#I = 1.
If (p,d,q) Î d, then p \not Î F.
For all distinct transitions (p,d₁,q₁), (p,d₂,q₂) Î d, d₁ and d₂ are two mutually exclusive directives in D_SWA(S).

Two directives d₁,d₂ Î D_SWA(S) are mutually exclusive exactly if, for all w Î S^* and u Î V_w, dom(R_w(d₁)) Çdom(R_w(d₂)) = Æ. The following pairs of directives in D_SWA(S) are mutually exclusive:

{®,tail},{¬,head}
{head,Øhead},{tail,Øtail}
{lab_s₁,lab_s₂} with s₁ ¹ s₂

1.3 String-Walking Pebble Automata

We will extend the concept of string-walking automata with a pebble [BH65]. A string-walking pebble automaton is able to drop its pebble at its current position in the string, pick it up at any (later) time, if it is at the same position again, and check whether or not the pebble is present at the automaton's current position. Let S be an alphabet. The directives for a string-walking pebble automaton are

D_SWPA(S)

D_SWA(S)È{ put, lift, here, Øhere}

{ ¬, ®, head, Øhead, tail, Øtail, put, lift, here, Øhere} È{ lab_s\mid s Î S}

The meaning of the new directives is as follows:

put means drop the pebble at the current position in the string.
lift means lift the pebble from the current position in the string (only if it is there now).
here checks whether the pebble is at the current position.
Øhere is the negation of here.

For all strings w Î S^*, we will denote the pebble positions with elements of P_w = V_w È{^}. Here ^ denotes that the pebble is not placed at any position of the string, i.e., the automaton ``carries'' the pebble. We now need to extend the binary relations R_w(d) (for d Î D_SWA(S)) to be on pairs (u,p) Î V_w ×P_w to take pebble-handling into account. Let, for all strings w Î S^* and directives d Î D_SWA(S), [R\tilde]_w(d) denote the binary relation R_w(d) as defined for string-walking automata in the previous section. We now define R_w(d) for string-walking pebble automata as follows:

R_w(d) = { ((i,p),(i¢,p)) \mid (i,i¢) Î

~
R

(d) and p Î P_w } .

This means that the directives from D_SWA(S) do not alter the pebble position.

The relations for the new directives are as follows.

R_w(put)

{ ((i,^), (i, i)) \mid 1 £ i £ |w|}

R_w(lift)

{ ((i, i), (i, ^)) \mid 1 £ i £ |w|}

R_w(here)

{ ((i, i), (i, i)) \mid 1 £ i £ |w|}

R_w(Øhere)

{ ((i,p), (i,p)) \mid1 £ i £ |w| and p ¹ i }

Let S be an alphabet. A string-walking pebble automaton is a finite automaton A over D_SWPA(S). For a string-walking pebble automaton A = (Q,D_SWPA(S), d, I, F) and a string w Î S^*, an element (q,u,p) Î Q ×V_w ×P_w is a configuration of the automaton. The set of all configurations of A and w is denoted by \mathbb C_A,w. A configuration (q,u,p) signifies that A is in state q at position u, while the pebble is at position p.

The step relation \twoheadrightarrow_A,w is extended in the obvious way. For a string-walking pebble automaton A, a string w Î S^* and configurations (q,u,p),(q¢,u¢,p¢) Î \mathbb C_A,w,

(q,u,p) \twoheadrightarrow_A,w^* (q¢,u¢,p¢) iff $d Î D_SWPA(S): (q,d,q¢) Î d and ( (u,p),(u¢,p¢) ) Î R_w(d).

For the relation that A computes, we now demand that A begins and ends with the pebble not on the string, or ``in its pocket''. For w Î S^*,

R_w(A) = { (u,v) Î V_w ×V_w \mid (q,u,^) \twoheadrightarrow_A,w^*(q¢,v,^) for some q Î I and q¢ Î F }.

The definitions of R(A) and L(A) remain the same. The class of all languages recognized by a string-walking pebble automaton is named SWPA.

Transitions of the form (p,s,q) with s Î D_SWPA(S)^* are treated as with string-walking automata. The definition of a deterministic string-walking pebble automaton is similar to that of a deterministic string-walking automaton. The only difference is that the directives are now taken from D_SWPA(S). The following pairs of directives in D_SWPA(S) are mutually exclusive:

{®,tail},{¬,head}
{here,put},{lift,put},{Øhere,lift}
{head,Øhead},{tail,Øtail},{here,Øhere}
{lab_s₁,lab_s₂} with s₁ ¹ s₂

1.4 String-Walking Automata with MSO Tests

Another way to extend string-walking automata (as done in [BE97] for tree-walking automata, see also Section 3.7) is to allow the automaton to test any MSO definable property of the current position of the string, using MSO formulas with one free position variable. This extension also requires the introduction of new directives. Let S be an alphabet. The set of directives over S is

D_SWA+M(S) = { ¬, ®} ÈMSOL₁(S) .

Note that head, tail, lab_s, etc. are MSO definable properties so they need not be explicitely stated in the definition of D_SWA+M(S). Note also that D_SWA+M(S) is an infinite set. Therefore we need to define for each string-walking automaton with MSO tests a finite subset of D_SWA+M(S) as the directives used by the automaton. For the directive d = y(x) with y Î MSOL₁(S), we define the following binary relation on V_w (w Î S^*):

R_w(y(x)) = {(i,i) \mid (w,i) \models y(x) } .

A string-walking automaton with MSO tests over S is a finite automaton A over a finite subset of D_SWA+M(S). The definitions of configurations, \twoheadrightarrow_A,w, R_w(A), etc. are unchanged with respect to the definitions for (ordinary) string-walking automata. The definitions of deterministic string-walking automata and mutually exclusive directives are also unchanged, although it is no longer possible to list all mutually exclusive pairs of directives. Note that it is decidable whether a given pair of directives in MSOL₁(S) is mutually exclusive.

The class of all languages recognized by a string-walking automaton with MSO tests is named SWA+M.

In [BE97] the following results are proven for trees. They also hold for the special case of strings.

Proposition 5 The following three statements hold:

SWA+M=REG
For every alphabet S and every binary MSO formula f(x,y) Î MSOL₂(S), there exists a string-walking automaton with MSO tests A such that R(A) = R(f).
For every alphabet S and every string-walking automaton with MSO tests A over S, there exists a binary MSO formula f(x,y) Î MSOL₂(S) such that R(f) = R(A).

1.5 From MSO Formulas to
String-Walking Pebble Automata

Both MSO formulas with two position variables and string-walking pebble automata compute binary position relations. In this section, we show that every binary position relation that can be defined by an MSO formula, can also be computed by a string-walking pebble automaton. In Section 1.8 we will show that the reverse is also true, i.e., every binary position relation that can be computed by a string-walking pebble automaton can also be defined by a binary MSO formula.

Let S be an alphabet and let f(x,y) be a binary MSO formula over S. We have seen (Lemma 4) that there exists a finite automaton M over SÈ(S×B₂) such that, for all w Î S^* and u,v Î V_w,

(w,u,v) \models f(x,y) iff mark(w,u,v) Î L(M) .

The following lemma states that there is a string-walking pebble automaton that computes the same relation as f(x,y) defines.

Lemma 6 For every finite automaton M over SÈ(S×B₂) there exists a string-walking pebble automaton A over S such that, for all w Î S^* and u,v Î V_w,

mark(w,u,v) Î L(M) iff (u,v) Î R_w(A).

Proof. Let M = (Q_M,SÈ(S×B₂), d_M, I_M, F_M) be a finite automaton over SÈ(S×B₂). We will define a string-walking pebble automaton over S, A, that simulates M, using the pebble to simulate the marked positions in M. A is defined as the (intuitive) union of three automata A₁, A₂, A₃. We define A as (Q₁ ÈQ₂ ÈQ₃, D_SWPA(S), d₁ Èd₂ Èd₃, I₁ ÈI₂ ÈI₃, F₁ ÈF₂ ÈF₃). A₁ deals with the case that u comes ``before'' v in w, A₂ deals with the case that v comes first and A₃ deals with the case u = v. For 1 £ i £ 3, A_i = (Q_i, D_SWPA(S), d_i, I_i, F_i).

A₁ simulates M, assuming that u comes before v in w. A₁ first drops its pebble at the start position u. This makes it possible to determine later where the automaton started. Then it walks to the head of the string. Next it simulates M in three stages. In the first stage of the simulation, it simulates M (only taking into account transitions of the form (p,s,q) with s Î S) until it finds the pebble. It picks up the pebble and treats the symbol s on this position as (s,1,0) in M. The automaton now continues simulating M (the second stage of the simulation), again without transitions with labels in S×B₂. Then, nondeterministically, A₁ drops the pebble at some position v and treats the symbol s at this position as (s,0,1) in M. A continues simulating M (the third stage) until it reaches the end of the string. If it reaches the end of the string in a final state of M, the choice for v was a good choice. The automaton then backs up until it finds the pebble at position v. This ends A₁.

We use the following states in A₁.

in₁ is the initial state.
tobegin₁ is used to walk to the head of the string.
q_1,i (for q Î Q_M) is used in the ith stage of the simulation.
backup₁ is used to back up to v after the three stages of simulation.
final₁ is the final state.

Formally, A₁ = (Q₁, D_SWPA(S), d₁, I₁, F₁) is constructed as follows:

Q₁

{ in₁, tobegin₁, backup₁, final₁ } È

{ q_1,i \mid q Î Q_M, 1 £ i £ 3 }

d₁

{ (in₁, put, tobegin₁),(tobegin₁, ¬, tobegin₁) } È

{ (tobegin₁, head, q_1,1) \mid q Î I_M } È

{ (p_1,1, (lab_s, ®), q_1,1)\mid (p,s,q) Î d_M } È

{ (p_1,1, (lift, lab_s, ®), q_1,2) \mid(p,(s,1,0),q) Î d_M } È

{ (p_1,2, (lab_s, ®), q_1,2)\mid (p,s,q) Î d_M } È

{ (p_1,2, (put, lab_s, ®), q_1,3) \mid(p,(s,0,1),q) Î d_M } È

{ (p_1,2, (put, lab_s, tail), backup₁) \mid(p,(s,0,1),q_fin) Î d_M for some q_fin Î F_M } È

{ (p_1,3, (lab_s, ®), q_1,3)\mid (p,s,q) Î d_M } È

{ (p_1,3, (lab_s, tail), backup₁)\mid (p,s,q_fin) Î d_M for some q_fin Î F_M } È

{ (backup₁, ¬, backup₁), (backup₁, lift, final₁) }

I₁

{in₁}

F₁

{ final₁ }

A₂ is similar to A₁, only here the automaton, after dropping its pebble on u, first walks to the right until the end of the string, then simulates M (inverted), starting in a state corresponding to a state in F_M, until it finds the pebble at u. It picks up the pebble and treats the symbol s at this position as (s,0,1) in M. A₂ continues with the second stage of the simulation. Then, nondeterministically, A₂ drops the pebble at some position v and continues with the third stage of the simulation until the beginning of the string. If it reaches the beginning of the string in an initial state of M, A₂ walks back to the pebble at v and the simulation is finished.

Formally, A₂ = (Q₂, D_SWPA(S), d₂, I₂, F₂) is constructed as follows:

Q₂

{ in₂, toend₂, backup₂, final₂ } È

{ q_2,i \mid q Î Q_M, 1 £ i £ 3 }

d₂

{ (in₂, put, toend₂),(toend₂, ®, toend₂) } È

{ (toend₂, tail, q_2,1) \mid q Î F_M } È

{ (q_2,1, (lab_s, ¬), p_2,1)\mid (p,s,q) Î d_M } È

{ (q_2,1, (lift, lab_s, ¬), p_2,2) \mid(p,(s,1,0),q) Î d_M } È

{ (q_2,2, (lab_s, ¬), p_2,2)\mid (p,s,q) Î d_M } È

{ (q_2,2, (put, lab_s, ¬), p_2,3) \mid(p,(s,0,1),q) Î d_M } È

{ (q_2,2, (put, lab_s, head), backup₂) \mid(p,(s,0,1),q_in) Î d_M for some q_in Î I_M } È

{ (q_2,3, (lab_s, ¬), p_2,3)\mid (p,s,q) Î d_M } È

{ (q_2,3, (lab_s, head), backup₂)\mid (p,s,q_in) Î d_M for some q_in Î I_M } È

{ (backup₂, ®, backup₂), (backup₂, lift, final₂) }

I₂

{in₂}

F₂

{ final₂ }

A₃ looks again like the previous two parts of A. A₃ however assumes that u = v, which implies that the only useful transitions with labels in S×B₂ are the ones of the form (p,(s,1,1),q) Î d_M. So A₃ first drops its pebble at position u ( = v), then (like A₁) walks to the head of the string and simulates M until it gets to the starting position (the pebble) again. It does not lift the pebble, because it needs to know the position of u = v. A₃ treats the symbol s at this position as (s,1,1) in M. Then it continues with the second stage of the simulation of M until the end of the string. If it arrives there in a final state of M, A₃ backs up to the pebble which ends the simulation of M.

Formally, A₃ = (Q₃, D_SWPA(S), d₃, I_A, F_A) is constructed as follows:

Q₃

{ in₃, tobegin₃, backup₃, final₃ } È

{ q_3,i \mid q Î Q_M, 1 £ i £ 2 }

d₃

{ (in₃, put, tobegin₃),(tobegin₃, ¬, tobegin₃) } È

{ (tobegin₃, head, q_3,1) \mid q Î I_M } È

{ (p_3,1, (lab_s, ®), q_3,1)\mid (p,s,q) Î d_M } È

{ (p_3,1, (here, lab_s, ®), q_3,2) \mid(p,(s,1,1),q) Î d_M } È

{ (p_3,1, (here, lab_s, tail), backup₃) \mid(p,(s,1,1),q_fin) Î d_M for some q_fin Î F_M } È

{ (p_3,2, (lab_s, ®), q_3,2)\mid (p,s,q) Î d_M } È

{ (p_3,2, (lab_s, tail), backup₃)\mid (p,s,q_fin) Î d_M for some q_fin Î F_M} È

{ (backup₃, ¬, backup₃), (backup₃, lift, final₃) }

I₃

{in₃}

F₃

{ final₃ }

\square

Using this lemma the following theorem becomes trivial to prove.

Theorem 7 For each formula f(x,y) Î MSOL₂(S), there exists a string-walking pebble automaton A over S such that R(f) = R(A).

1.6 Another proof

There is another way to prove Theorem 7. This method makes use of Proposition 5 to simulate the binary MSO formula by a string-walking automaton with MSO tests. The MSO tests are then simulated by making use of a pebble. Let S be an alphabet, and let f(x,y) Î MSOL₂(S) be an MSO formula with two free position variables. Then there exists a string-walking automaton with MSO tests A = (Q_A,D_A,d_A,I_A,F_A) with D_A a finite subset of D_SWA+M(S) such that R(f) = R(A) (Proposition 5). We will construct a string-walking pebble automaton A¢ = (Q¢,D_SWPA(S),d¢,I¢,F¢) such that R(A¢) = R(A) = R(f). We need the following result, which is Lemma 4 for the case k = 1.

Lemma 8 Let S be an alphabet. For every MSO test y(x) Î MSOL₁(S), there exists a finite automaton M over SÈ(S×B₁) such that, for all w Î S^* and u Î V_w,

(w,u) \models y(x) iff mark(w,u) Î L(M) .

We now construct A¢. In first approximation, A¢ equals A for transitions (p,d,q) with d Î { ¬, ® }. We define T = {(p,d,q) Î d_A \mid d Î MSOL₁(S)}, the collection of all transitions with MSO tests in A. For each transition t = (p,y(x),q) Î T, we use Lemma 8 to construct the finite automaton M_t that calculates the MSO test y(x). We will use this automaton to construct a ``subroutine'' in A¢ that simulates t. The idea is as follows, similar to the construction of A₃ in the proof of Lemma 6. A¢ first drops its pebble. Then it walks to the head of the string. From there, it checks the MSO test by following M_t. Any transitions in d_{M_t} with labels not in S are handled by using the pebble. The only possible form of such a transition is (p,(s,1),q). In place of this transition, we add transitions to check whether the position in the string where the automaton is now is marked, i.e., the pebble is there. Then the symbol s is checked and the automaton moves on to the next position in the string in state q. If M_t reaches a final state at the end of the string, A¢ moves back to the position where it dropped the pebble and lifts it. At this moment the MSO test y(x) is successfully verified by the automaton.

The following states are used for A¢. First, all states from A, Q_A. Then we need extra states for each MSO test. For all transitions t Î T we use, next to the states of M_t, tobegin_t to move to the beginning of the string and backup_t to back up to the pebble, the place where the simulation of M_t started. We assume that all the M_t for all t Î T have unique states that do not overlap with either Q_A or states of other M_t's.

Formally, A¢ = (Q¢,D_SWPA(S),d¢,I¢,F¢) is constructed as follows:

Q¢

Q_A È{ tobegin_t, backup_t \mid t Î T }È

È
t Î T

Q_{M_t}

I¢

I_A

F¢

F_A

d¢

{ (p,d,q) Î d_A \midd Î {¬,®} } È

{ d_t \mid t Î T }

Here the transitions needed to simulate t = (p,y(x),q) are as follows:

d_t

{ (p, put, tobegin_t),(tobegin_t, ¬, tobegin_t)} È

{ (tobegin_t, head, s_in)\mid s_in Î I_{M_t} } È

{ s,(Øtail, Øhere, lab_s,®),t) \mid (s,s,t) Î d_{M_t} } È

{ s,(Øtail,here,lab_s,®),t) \mid (s,(s,1),t) Î d_{M_t} } È

{ s,(tail,Øhere,lab_s),backup_t) \mid (s,s,t_fin) Î d_{M_t} for some t_fin Î F_{M_t} } È

{ s,(tail, here,lab_s),backup_t) \mid (s,(s,1),t_fin) Î d_{M_t} for some t_fin Î F_{M_t} } È

{ (backup_t, (Øhere, ¬), backup_t),(backup_t, lift, q) }

1.7 Pebble Necessity

The following theorem states that we really need the pebble in Theorem 7. The proof of this theorem is a simpler form of the proof for tree-walking automata (Theorem 15 in [BE97]). It is included here for completeness.

Theorem 9 There exist an alphabet S and a formula f(x,y) Î MSOL₂(S) such that there is no string-walking automaton A with R(A) = R(f).

Proof. We construct an MSO formula f(x,y) Î MSOL₂(S) with S = {b,r}, denoting black and red. For a string w over S, the binary relation R(f) connects certain positions of w. If w has exactly one position with symbol r, say v_red, then R(f) will connect any position to v_red. Otherwise, i.e., if there is no red symbol or more than one, R(f) will connect any position to the next in left-to-right circular order. We use the following abbreviations to define f(x,y):

ors

$y ( lab_r(y) Ù"z(lab_r(z) ® z = y))

succ(x,y)

pre(x,y) Ú( tail(x) Ùhead(y) )

f(x,y)

( orsÙlab_r(y) ) Ú( ØorsÙsucc(x,y) )

Here ors is a closed MSO formula which is true if there is exactly one red symbol in w, and succ(x,y) is an MSO formula which is true if y follows x in left-to-right circular order. Clearly, f(x,y) is a binary MSO formula with R(f) as indicated above.

Now we will show that there is no string-walking automaton A = (Q,D_SWA(S),d,I,F) with R(A) = R(f). Suppose that such an A exists. We may assume that I = {q_in}. The idea is that when A starts at any position u of a string with only b's, it first has to visit all positionsc of the string to check there is no r in the string. Because there is no way for A to remember its starting point u, A also cannot find the position after u in left-to-right circular order.

Let w be b^#Q+1, a string of #Q+1 black symbols. Let w¢ be rb^#Q. Thus, w¢ is w with the first symbol changed to red. We will use the following function:

succ(k) =

ì
í
î

k+1

if k £ #Q

otherwise (k = #Q+1)

which gives the successor to position k in left-to-right circular order. Because w does not contain a symbol r, there exists for every k Î [1,#Q+1] a state f_k Î F such that

(q_in,k) \twoheadrightarrow_A,w^* (f_k, succ(k)) .

On the other hand, because w¢ contains exactly one symbol r, for all k Î [1,#Q], there is no state f Î F such that

(q_in,k) \twoheadrightarrow_A,w¢^* (f, succ(k)) .

This difference implies that for all k Î [1, #Q] the walk of A on w from k to succ(k) must visit position 1, because the only different symbol is the first, which is black for w and red for w¢. Because the walk of A on w from #Q+1 to 1 also visits position 1, there is for all k Î [1,#Q+1] a q_k Î Q such that

(q_in,k) \twoheadrightarrow_A,w^* (q_k,1) \twoheadrightarrow_A,w^* (f_k,succ(k)) .

There are #Q+1 possibilities for k and only #Q states, which means there are k,k¢ Î [1,#Q+1] with k ¹ k¢ and q_k = q_k¢. Then

(q_in,k) \twoheadrightarrow_A,w^* (q_k,1) = (q_k¢,1) \twoheadrightarrow_A,w^* (f_k¢,succ(k¢)) .

This implies that A walks from k both to succ(k) and to succ(k¢) with k ¹ k¢, a contradiction to the fact that R(A) = R(f). \square

1.8 From String-Walking Pebble Automata to
MSO Formulas

In this section we will prove that a string-walking pebble automaton can be simulated by a string-walking automaton with MSO tests. Because it is already known that these automata can be simulated by binary MSO formulas (Proposition 5), and because we have shown in previous sections that a binary MSO formula can be simulated by a string-walking pebble automaton, we obtain the equivalence of binary MSO formulas and string-walking pebble automata.

To prove that it is possible to simulate the use of a pebble with MSO tests, we will need the following lemma. In the lemma, A is a string-walking pebble automaton that is not allowed to move its pebble, but it is allowed to check the presence of the pebble at the current position. The lemma states that the round-trip of A from the position where the pebble is, back to that position can be simulated by a unary MSO formula.

Lemma 10 Let A be a string-walking pebble automaton, with no transitions with directives put or lift, over S. Then, for every pair of states p,q Î Q_A, there exists a formula y_pq(x) Î MSOL₁(S) such that, for all w Î S^* and x Î V_w,

(p,x,x) \twoheadrightarrow_A,w^* (q,x,x) iff (w,x) \models y_pq(x) .

Proof. Let A be a string-walking pebble automaton over S without put and lift. Let p,q Î Q_A. First note that the pebble does not move during A's walk. We can therefore consider the pebble position a constant and mark it using a special symbol (s,1) where s is the original symbol at the pebble position. We will now define a string-walking automaton A¢ (without pebble) over SÈ(S×B₁) that simulates the walk of A from the pebble position and back. We define A¢ = (Q_A, D_SWA(SÈ(S×B₁)), d¢, {p}, {q}). Thus, the states of A¢ are the same as those of A, A¢'s initial state is p, and its final state is q. Furthermore, d¢ consists of the following transitions:

d¢

d_A Ç(Q_A ×D_SWA(S)×Q_A) È

{ (s,lab_(s,1),t) \mid (s,here,t) Î d_A, s Î S} È

{ (s,lab_s,t) \mid (s,Øhere,t) Î d_A, s Î S} È

{ (s,lab_(s,1),t) \mid(s,lab_s,t) Î d_A, s Î S}.

The directive ``here'' is simulated by checking whether the symbol at the current position is in S×B₁. Similarly, ``Øhere'' is simulated by checking whether the symbol at the current position is in S. Otherwise, the new symbols from S×B₁ are treated just like the corresponding symbols from S. Now, for all s,t Î Q_A, w Î S^* and x,y,z Î V_w, it is easy to see that

(s,x,z) \twoheadrightarrow_A,w^* (t,y,z) iff (s,x) \twoheadrightarrow_A¢,w¢^* (t,y)

where w¢ = mark(w,z). This means that A¢ on a string marked at position z walking from x to y simulates A on the unmarked string with the pebble at position z. Both automata start in state s and end in state t. Also, from the definition of R_w¢(A¢),

(p,x) \twoheadrightarrow_A¢,w¢^* (q,y) iff (x,y) Î R_w¢(A¢) .

According to Proposition 5 we can construct a formula f¢_pq(x,y) Î MSOL₂(SÈ(S×B₁)) that simulates the walk of A¢ on a marked string, so that, for all w Î S^*, x,y,z Î V_w and w¢ = mark(w,z),

(x,y) Î R_w¢(A¢) iff (w¢,x,y) \models f¢_pq(x,y) .

Using Lemma 2 yields a formula f_pq(x,y,z) Î MSOL₃(S) such that, for all w Î S^* and x,y,z Î V_w,

(mark(w,z),x,y) \models f¢_pq(x,y) iff (w,x,y,z) \models f_pq(x,y,z) .

We now define y_pq(x) = f_pq(x,x,x) Î MSOL₁(S). Then, for all w Î S^* and u Î V_w,

(p,x,x) \twoheadrightarrow_A,w^* (q,x,x) iff (w,x) \models y_pq(x) .

\square

Theorem 11 For every string-walking pebble automaton A there exists a string-walking automaton with MSO tests A¢ such that R(A) = R(A¢).

Proof. When A drops its pebble at some position u of string w, it must return there some later time to pick it up, because of the definition of R_w(A). In the mean time, i.e., until A returns to u, A does not move its pebble. By using Lemma 10, the actions of this automaton until its arrival in u can be simulated by a single MSO test. The details are as follows.

Let A = (Q,D_SWPA(S),d,I,F) be a string-walking pebble automaton over S. We will define a string-walking automaton with MSO tests A¢. The MSO tests y_pq are obtained by applying Lemma 10 to the automaton derived from A by removing all transitions of the form (p,d,q) with d Î {put,lift}. We define A¢ = (Q,D,d¢,I,F) with

{ ¬, ®, true(x), head(x), tail(x), Øhead(x), Øtail(x) } È

{ lab_s(x) \mid s Î S} È{ y_pq(x) \mid p,q Î Q }

d¢

{ (p,d,q) Î d\mid d Î { ¬, ® } } È

{ (p,head(x),q) \mid (p,head,q) Î d} È

{ (p,Øhead(x),q) \mid (p,Øhead,q) Î d} È

{ (p,tail(x),q) \mid (p,tail,q) Î d} È

{ (p,Øtail(x),q) \mid (p,Øtail,q) Î d} È

{ (p,lab_s(x),q) \mid (p,lab_s,q) Î d} È

{ (p,true(x),q) \mid (p,Øhere,q) Î d} È

{ (p,y_qr(x),s) \mid (p,put,q) Î d and (r,lift,s) Î d}

The directives ¬, ®, head, Øhead, tail, Øtail and lab_s (for s Î S) are simulated straightforwardly. The MSO test y_qr(x) simulates the actions of A between the put and the lift. \square

This theorem leads us to the grand finale of this chapter.

Theorem 12 The following two statements hold:

For every MSO formula f(x,y) Î MSOL₂(S) there exists a string-walking pebble automaton A over S such that R(A) = R(f).
For every string-walking pebble automaton A over S there exists an MSO formula f(x,y) Î MSOL₂(S) such that R(f) = R(A).

There is another way to prove that any string-walking pebble automaton can be simulated by an MSO formula with two position variables. Let A be a string-walking pebble automaton over S that recognizes the relation R(A). Then a string-walking pebble automaton A¢ over SÈ(S×B₂) can easily be constructed such that (w,u,v) Î R(A) just if mark(w,u,v) Î L(A¢). A¢ walks right until it encounters a symbol (s,1,x) with x Î {0,1}. Then A¢ starts simulating A directly until it encounters a symbol (s,y,1) with y Î {0,1} in a final state. Because SWPA=REG [BH65] and REG=MSO (Proposition 1), there exists an MSO formula y Î MSOL₀(SÈ(S×B₂)) such that L(A¢) = L(y). Using Corollary 3, we obtain an MSO formula f(x,y) Î MSOL₂(S) such that, for all w Î S^*, mark(w,u,v) Î L(y) just if (w,u,v) \models f(x,y).

Note that we can also use Theorem 12 to derive that SWPAREG, as follows:

Theorem 13 SWPAREG

Proof. First we will show that any language defined by a string-walking pebble automaton is MSO definable and hence regular. Let A be a string-walking pebble automaton over S. By definition, the language recognized by A is

L(A) = { w Î S^* \mid there exists v Î V_w such that (1,v) Î R_w(A) }.

Using the second part of Theorem 12 we obtain a formula f(x,y) Î MSOL₂(S) such that, for all w Î S^* and u,v Î V_w,

(u,v) Î R_w(A) iff (w,u,v) \models f(x,y) .

We now define

y = "x ( $y (head(x) ® f(x,y) ) )

to obtain that, for all w Î S^* and v Î V_w,

$v Î V_w ((1,v) Î R_w(A)) iff $v Î V_w ((w,1,v) \models f(x,y)) iff w \models y

L(A) = L(y)

and, since MSO=REG, L(A) is a regular language.

The other way around, we show that any regular language can be described by a string-walking pebble automaton. Let L be a regular language over an alphabet S. Let M = (Q,S,d,I,F) be a finite automaton such that L = L(M). We define a string-walking pebble automaton A = (Q¢,D_SWPA(S),d¢,I,F¢) with

Q¢

Q È{q_fin}

d¢

{ (p,(lab_s,®),q) \mid (p,s,q) Î d} È

{ (p,(lab_s,tail),q_fin) \mid (p,s,q) Î d for some q Î F }

F¢

{ q_fin}

It is obvious to see that L(A) = L(M) = L. \square

Chapter 2
Extensions to
String-Walking Pebble Automata

In this chapter we will introduce two extensions to the concept of string-walking pebble automata. These extensions will be constructed in such a way that the class of languages recognized by the automata is the same as the class of languages that the (ordinary) string-walking automata recognize, i.e., the class of regular languages.

2.1 String-Walking Multi-Pebble Automata

The first extension to the string-walking automaton is the string-walking multi-pebble automaton. This automaton can use more than one pebble. The number of pebbles available to the automaton is fixed. The pebbles will be taken from a finite set and must be used ``nested''. This means that, at any point in the automaton's walk, only the pebble that was dropped last can be lifted. It can easily be seen that we can, without changing the automaton's power, number the pebbles {1,2,¼,n} and demand that the pebbles must be used in order. In order to see this, suppose the pebbles are taken from a set P with |P| = n. Consider the set M of bijections from P to {1,2,¼,n} (there are n! such mappings). We can now simulate an automaton with pebble set P by an automaton with pebble set [1,n] by coding the pebble mapping into the states: for every state q we use additional states {q_m \mid m Î M}. We also modify the transitions with put and lift so that the information on the mapping is maintained. Note that the constraint of the nested use of pebbles is necessary to make sure that the automaton does not recognize non-regular languages. A string-walking 2-pebble automaton without this limitation could recognize the language L = {aⁿcbⁿ \mid n Î \mathbb N} by putting the pebbles at both ends of the string and moving them towards the middle of the string in turn, checking that both pebbles arrive at the middle c after the same number of moves. This kind of use of the pebbles is prohibited by demanding that the pebbles be used nested. In broader terms, it can be shown that unnested multi-pebble automata are equivalent to two-way multi-head automata, which in turn are equivalent to logarithmic space Turing machines.

Let n ³ 1. We use the following set of directives for a string-walking n-pebble automaton over S:

D_SWnPA(S)

D_SWA(S)È{ put_i, lift_i, here_i, Øhere_i \mid 1 £ i £ n }

For each w Î S^* the relations for the new directives are as follows. The relations are on tuples (u,k,p) Î V_w ×[0,n]×([1,n] ® P_w). Here k represents that pebbles 1,¼,k are currently on the string and p maps the pebbles to string positions. We define p₀ : [1,n] ® P_w with p₀(i) = ^ for all i Î [1,n] as the mapping with no pebbles on the string.

The relations hold for any u Î V_w, k Î [0,n], and pebble placement function p:[1,n] ® P_w.

R_w(put_i)

{ ((u, i-1, p), (u, i, p(i ® u))) \mid 1 £ i £ n }

R_w(lift_i)

{ ((u,i,p), (u,i-1,p(i ® ^))) \mid 1 £ i £ n, p(i) = u }

R_w(here_i)

{ ((u,k,p), (u,k,p)) \mid 1 £ i £ k and p(i) = u }

R_w(Øhere_i)

{ ((u,k,p), (u,k,p)) \mid 1 £ i £ n and (i > k or p(i) ¹ u) }

A string-walking n-pebble automaton over S is a finite automaton A over D_SWnPA(S). For a string-walking n-pebble automaton A = (Q,D_SWnPA(S),d,I,F) and a string w Î S^*, a configuration is a tuple (q,u,k,p) Î Q ×V_w ×[0,n] ×([1,n] ® P_w). The set of all configurations of A and w is names \mathbb C_A,w. The step relation \twoheadrightarrow_A,w is defined in the obvious way. For all configurations (q,u,k,p), (q¢,u¢,k¢,p¢) Î \mathbb C_A,w,

(q,u,k,p) \twoheadrightarrow_A,w (q¢,u¢,k¢,p¢) iff $d Î D_SWnPA(S):(q,d,q¢) Î d, ((u,k,p),(u¢,k¢,p¢)) Î R_w(d).

For each string w Î S^*, A computes the relation

R_w(A) = { (u,v) Î V_w ×V_w \mid (q,u,0,p₀) \twoheadrightarrow_A,w^* (q¢,v,0,p₀) for some q Î I and q¢ Î F } .

The position relation R(A) and the language L(A) that A computes are defined in the usual way, as well as the behaviour of transitions of the form (q,s,q¢) with d Î D_SWnPA(S)^*.

2.2 From String-Walking Multi-Pebble Automata to
MSO Formulas

In this section we show that any position relation that can be computed by a string-walking multi-pebble automaton can also be defined by a binary MSO formula. Note that the other way around is obvious, because string-walking multi-pebble automata are an extension to string-walking pebble automata.

Theorem 14 For every string-walking n-pebble automaton A over an alphabet S there exists a binary MSO formula f(x,y) Î MSOL₂(S) such that R(f) = R(A).

Proof. We will prove this theorem using natural induction on the number of pebbles n. Note that the case n = 1 is equal to the second part of Theorem 12. We present a proof similar to that in Section 1.8. Choose a number of pebbles n > 1. In the induction step, it can be assumed that Theorem 14 is valid for any string-walking (n-1)-pebble automaton (the induction hypothesis).

The following Claim (similar to Lemma 10) states that after a string-walking n-pebble automaton A drops its first pebble, its round-trips from the pebble position back to that same position can be simulated by a unary MSO formula. In the proof of this lemma we use the induction hypothesis.

[: Claim] Let A = (Q,D_SWnPA(S),d,I,F) be a string-walking n-pebble automaton, with no transitions with directives put₁ or lift₁. Then, for every pair of states p,q Î Q, there exists a formula y_pq(x) Î MSOL₁(S) such that, for all w Î S^* and x Î V_w,

(p,x,1,p₀(1® x)) \twoheadrightarrow_A,w^* (q,x,1,p₀(1 ® x)) iff (w,x) \models y_pq(x) .

To prove this Claim, we construct a string-walking (n-1)-pebble automaton A¢ over SÈ(S×B₁) that simulates the walk of A from the position of the first pebble and back (similar to the proof of Lemma 10). We define A¢ = (Q,D_SW(n-1)PA(SÈ(S×B₁)), d¢, {p},{q}) where

d¢

dÇ(Q ×D_SWA(S)×Q) È

{ (s,lab_(s,1), t) \mid (s,here₁,t) Î d, s Î S} È

{ (s,lab_s,t) \mid (s,Øhere₁,t) Î d, s Î S} È

{ (s,lab_(s,1), t) \mid (s,lab_s,t) Î d, s Î S} È

{ (s,put_i-1,t) \mid (s,put_i,t) Î d, 2 £ i £ n } È

{ (s,lift_i-1,t) \mid (s,lift_i,t) Î d, 2 £ i £ n } È

{ (s,here_i-1,t) \mid (s,here_i,t) Î d, 2 £ i £ n } È

{ (s,Øhere_i-1,t) \mid (s,Øhere_i,t) Î d, 2 £ i £ n }

The states of A¢ are the same as those of A, A¢'s only initial state is p and A¢'s only final state is q. All directives that do not use or test the pebble are simulated directly. Pebble 1 is simulated by the marking, while pebbles 2,¼,n are simulated by pebbles 1,¼,n-1. Now it is easy to see that, for all s,t Î Q, w Î S^*, x,y,z Î V_w and w¢ = mark(w,z),

(s,x,1,p₀(1 ® z)) \twoheadrightarrow_A,w^* (t,y,1,p₀(1 ® z)) iff (s,x,0,p₀) \twoheadrightarrow_A¢,w¢^* (t,y,0,p₀).

This means that A¢ on a string marked at position z, starting in a configuration without pebbles and walking from position x to y simulates A on the unmarked string, in a configuration with pebble 1 at position z. Note that, following the definition of R_w¢(A¢) for a string-walking (n-1)-pebble automaton,

(p,x,0,p₀) \twoheadrightarrow_A¢,w¢^* (q,y,0,p₀) iff (x,y) Î R_w¢(A¢).

According to the induction hypothesis, there exists a binary MSO formula

f¢_pq(x,y) Î MSOL₂(SÈ(S×B₁))

that simulates the walk of A¢ on a marked string, such that, for all w Î S^*, x,y,z Î V_w and w¢ = mark(w,z),

(x,y) Î R_w¢(A¢) iff (w¢,x,y) \models f¢_pq(x,y).

The remainder of the proof of the Claim is again similar to the proof of Lemma 10. Again using Lemma 2 to pull the marking into the free position variables of the MSO formula, we obtain a formula f_pq(x,y,z) Î MSOL₃(S) such that, for all w Î S^*, x,y,z Î V_w and w¢ = mark(w,z),

(w¢,x,y) \models f¢_pq(x,y) iff (w,x,y,z) \models f_pq(x,y,z) .

We now define the abbreviation y_pq(x) = f_pq(x,x,x) Î MSOL₁(S). Then, for all w Î S^* and x Î V_w,

(p,x,1,p₀(1 ® x)) \twoheadrightarrow_A,w^* (q,x,1,p₀(1 ® x)) iff (w,x) \models y_pq(x) .

This ends the proof of the Claim. We now continue with the proof of Theorem 14. Let A = (Q,D_SWnPA(S),d,I,F) be a string-walking n-pebble automaton over S. We will, like in the proof of Theorem 11, define a string-walking automaton with MSO tests A¢ over S such that R(A) = R(A¢). The MSO tests y_pq(x) in this automaton are obtained by applying the Claim to the automaton derived from A by removing all transitions of the form (p,d,q) with d Î { put₁, lift₁ }. We define A¢ = (Q,D,d¢,I,F) exactly the same as in the proof of Theorem 11, but with Øhere₁, put₁ and lift₁ instead of Øhere, put and lift:

{ ¬, ®, true(x), head(x), tail(x),Øhead(x), Øtail(x) } È

{ lab_s(x) \mid s Î S} È{ y_pq(x) \mid p,q Î Q }

d¢

{ (p,d,q) Î d\mid d Î { ¬, ® } } È

{ (p,head(x),q) \mid (p,head,q) Î d} È

{ (p,Øhead(x),q) \mid (p,Øhead,q) Î d} È

{ (p,tail(x),q) \mid (p,tail,q) Î d} È

{ (p,Øtail(x),q) \mid (p,Øtail,q) Î d} È

{ (p,lab_s(x),q) \mid (p,lab_s,q) Î d} È

{ (p,true(x),q) \mid (p,Øhere₁,q) Î d} È

{ (p,y_qr(x),s) \mid (p,put₁,q) Î d and (r,lift₁,s) Î d}

Again, the directives {¬,®,head,Øhead,tail,Øtail} and lab_s (for s Î S) are simulated straightforwardly. The directive y_qr(x) simulates the actions of A from put₁ to the following lift₁ (at the same location), including all put_i and lift_i for 2 £ i £ n. It is easy to see that R(A) = R(A¢). Now, using Proposition 5, there exists a binary MSO formula f(x,y) Î MSOL₂(S) such that R(f) = R(A¢) = R(A). \square

2.3 String-Walking Marble Automata

Another way to extend string-walking pebble automata is the introduction of marbles instead of pebbles. They come in several colours , that are labeled with a colour alphabet G. For each colour g Î G, the number of marbles of colour g is not limited. In this paper, we give only the definition of string-walking marble automata and of their behaviour. The idea is that every marble colour can be used for a quantifier of MSO formulas; however, it is not proven here that string-walking marble automata compute exactly the position relations that binary MSO formulas recognize. This may not even be the case.

D_SWMA(S,G) = D_SWA(S)È{ put_g, lift_g, here_g, Øhere_g \mid g Î G}

For all directives d, R_w(d) is a binary relation over V_w ×(G® 2^V_w) ×\mathbb N ×(\mathbb N ® ((V_w ×G) È{^})). An element (u,p,n,s) of this set means that

The current position of the automaton is u.
For every marble colour g Î G, p(g) is the set of positions where a marble of colour g is lying.
n is the total number of marbles of any colour on the string.
For every natural number 1 £ i £ n, if s(i) = (v,g) Î V_w ×G, then the ith marble that was laid down is at position v and of colour g. For i > n, s(i) = ^. Effectively, s is a stack of marble colours and positions. This information is needed to enforce the nested use of the marbles.

Note that p can be determined from n and s: for every g Î G,

p(g) = { u Î V_w \mid $i Î [1,n]: s(i) = (u,g) } .

The relations for the directives from D_SWA(S) are extended straightforwardly. These directives do not alter p, n or s. For the directives put_g, lift_g, here_g and Øhere_g, the relations are as follows. The definition of R_w(put_g) is complicated, because we demand that the marbles are nested as usual and that all marbles of colour g must be put down after each other, without putting down a marble of another colour in between.

R_w(put_g)

{ ( (u,p,n,s), ( u,p(g® p(g) È{u}), n+1, s(n+1 ® (u,g))) ) \mid u \not Î p(g),

"1 £ i < j £ n ( Ø$v,v¢ Î V_w, g¢ Î G: (g¢ ¹ gÙs(i) = (v,g) Ùs(j) = (v¢,g¢) ) ) }

R_w(lift_g)

{ ( (u,p,n,s),(u,p(g® p(g) \{u}), n-1, s(n® ^))) \mid s(n) = (u,g) }

R_w(here_g)

{ ( (u,p,n,s), (u,p,n,s) ) \midu Î p(g) }

R_w(Øhere_g)

{ ( (u,p,n,s), (u,p,n,s) ) \midu \not Î p(g) }

We define p₀:G® 2^V_w with p₀(g) = Æ for all g Î G. We define s₀:\mathbb N ® ((V_w ×G) È{^}) with s(i) = ^ for all i Î \mathbb N.

The relation computed by A on a string w Î S^* is

R_w(A) = { (u,v) Î V_w ×V_w \mid(q,u,p₀,0,s₀) \twoheadrightarrow_A,w^* (q¢,v, p₀,0,s₀) for some q Î I and q¢ Î F }.

Chapter 3
Binary MSO Formulas and Tree-Walking Automata

3.1 Preliminaries

In this section we recall the well-known concepts of trees, finite tree automata, regular tree languages, and monadic second order logic.

Trees

In the usual way, trees are defined as finite, directed graphs with labeled nodes and edges. Let S and G be alphabets of node labels and edge labels, respectively. A graph over (S,G) is a triple (V,E,lab), with V a finite set of nodes, E Í V ×G×V the set of labeled edges, and lab: V ® S the node-labeling function. For a given graph g, its nodes, edges and node-labeling functions are denoted V_g, E_g and lab_g, respectively.

The trees we consider have their nodes labeled by symbols from a ranked alphabet. The edge from a parent to its i-th child is labeled with the number i. A ranked alphabet S is an alphabet S together with a rank function rk:S® \mathbb N. For any symbol s Î S, rk(s) = k denotes that the rank of s is k. In terms of trees this means that a node labeled with s will have k children. For all k Î \mathbb N, S_k = { s Î S\mid rk(s) = k } is the set of symbols of rank k. The rank interval of the ranked alphabet S is [1,m], where m is the maximal rank of the elements of S; it is denoted rki(S).

A tree over S is an acyclic connected graph g over (S, rki(S)) such that

No node of g has more than one incoming edge.
For every node u of g, u has only outgoing edges with labels in [1,rk(lab_g(u))].
For every node u of g and every i Î [1,rk(lab_g(u))], u has exactly one outgoing edge with label i.

The set of all trees over S is denoted T_S. A subset of T_S is also called a tree language.

The root of a tree t (the node with no incoming edges) is denoted root_t. For nodes u,v of t, if (u,i,v) Î E_t, then u is called the parent of v, and v is called the i-th child of u, denoted by u ·i. We also define u·0 = u. We define the set of ancestors anc(u) of a node u as u itself and the ancestors of its parent. The least common ancestor of two nodes u and v (denoted lca(u,v)) is defined as the node w such that

w is an ancestor of both u and v.
If w¢ is an ancestor of both u and v, then w¢ is an ancestor of w.

If u is an ancestor of v, then v is a descendant of u. The set des(u) denotes the set of all descendants of u. For a node u of t, t_u denotes the subtree of t with root u, i.e., the subgraph of t induced by the set of all descendants of u.

Let S be a ranked alphabet and let k Î \mathbb N. We define k-ary node relations in analogy to k-ary position relations for strings. A k-ary node relation over S is a subset of {(t, u₁, ¼, u_k) \mid t Î T_S and u_i Î V_t for all i Î [1,k]}.

We define marked trees as follows, in analogy to marked strings. Let S be a ranked alphabet and let k ³ 1. The ranked alphabet SÈ(S×B_k) is defined as usual, where we assign to the symbol (s, b₁, ¼, b_k) the rank rk(s) (for s Î S and b_i Î [1,k] for i Î [1,k]). Let t Î T_S and let u₁, ¼, u_k Î V_t. We define mark(t,u₁, ¼, u_k) as a tree over SÈ(S×B_k), with node-set V_t, edges E_t and node-labeling function lab¢: V_t ® SÈ(S×B_k) with lab¢(u) = lab_t(u) if u ¹ u_i for all i Î [1,k], and lab¢(u) = (lab_t(u), (u = u₁), ¼, (u = u_k)) otherwise (where (u = u_i) = 1 iff u equals u_i).

Finite Tree Automata

A finite tree automaton [GS84] traverses a tree bottom-up. It starts in all the leaves of the tree, in parallel, and works its way up to the root. A finite tree automaton has a finite set of states Q and a transition function. The state of the automaton in a node of the tree is determined by the states of the children of the node and by the label of the node. For a symbol s with rk(s) = k, the transition function d_s is therefore a function from Q^k to Q. A finite tree automaton accepts a tree if it ends at the root of the tree in one of a set of pre-determined final states.

Let S be a ranked alphabet. A (deterministic) finite tree automaton over S is a 4-tuple M = (Q,S,d,F) where Q is a finite set of states, S is the input alphabet, d = { d_s \mid s Î S} with, for s Î S_k, d_s: Q^k ® Q the transition function for s, and F Í Q the set of final states. For every tree t Î T_S and node u Î V_t, the state in which M reaches u, denoted state_M,t(u), is defined by bottom-up induction as state_M,t(u) = d_s(state_M,t(u ·1), ¼, state_M,t(u ·k)) where lab_t(u) = s with rk(s) = k. The tree language recognized by M is L(M) = { t Î T_S \mid state_M,t(root(t)) Î F }. A tree language L is called a regular tree language if it is recognized by a finite tree automaton. The class of all regular tree languages is named REGT.

For any finite tree automaton M = (Q,S,d,F), tree t Î T_S and node u Î V_t, the set of successful states of M at u, denoted by succ_M,t(u), is defined (by top-down induction) as follows: succ_M,t(root_t) = F and, for a node u Î V_t with lab_t(u) = s, rk(s) = k > 0, succ_M,t(u·i) is the set of all states q Î Q such that d_s(q₁, ¼, q_i-1,q,q_i+1,¼, q_k) Î succ_M,t(u), where q_j = state_M,t(u ·j) for j Î [1,i-1] È[i+1,k]. Intuitively, q Î succ_M,t(u) means that M will reach the root of t in a final state assuming that it reaches u in state q (instead of state_M,t(u)). Thus, t Î L(M) iff state_M,t(u) Î succ_M,t(u). This can easily be proven ([BE97], Lemma 1).

Monadic Second Order Logic (on trees)

The monadic second order logic on strings described in Section 1.1 can be extended to trees (see, e.g., [TW68,Don70,BE97]). We will in short describe the differences. Let S be a ranked alphabet.

The variables x,y,¼ are now node variables; the variables X,Y,¼ are now node-set variables. For a given tree t Î T_S, node variables range over V_t and node-set variables range over subsets of V_t.
The atomic formulas are lab_s(x), for every s Î S, denoting that x has label s; edg_i(x,y), for every i Î rki(S), denoting that y is the ith child of x; and x Î X, denoting that x is an element of X.
The language defined by a formula f Î MSOL₀(S) is now a tree language, L(f) = {t Î T_S \mid t \models f}

The class of all MSO definable tree languages is named MSOT.

The following classical result from [Don70,TW68] states that MSOT=REGT, i.e., that formulas from monadic second order logic with no free variables define exactly the regular tree languages:

Proposition 15 A tree language is MSO definable if and only if it is regular.

As stated in Section 1.1, the results of Lemma 2, Corollary 3, and Lemma 4 are also valid for trees (with S a ranked alphabet and t Î T_S instead of w Î S^*).

3.2 Tree-Walking Automata

A tree-walking automaton (see, e.g., [KS81]) is very similar to a string-walking automaton (Section 1.2). A tree-walking automaton can walk freely on the tree. In this paper we will give a definition of tree-walking automaton that is slightly different from the definition in [KS81]. First of all, we will make use of directives to define the behaviour of the automaton, while [KS81] directly defines a binary step relation on the set of configurations. Secondly, with our definition the automaton is able to check which child of its parent the current node is. In [KS81], the automaton can only move up, without any restriction to child number. This makes our automaton more powerful (i.e., it can recognize more tree languages).

Let S be a ranked alphabet. Like a string-walking automaton, a tree-walking automaton over S is a finite automaton with a special set of directives as input alphabet. For a tree-walking automaton, we define the set of directives as

D_TWA(S) = { _i, ¯_i \mid i Î rki(S) }È{ root, Øroot} È{ lab_s\mid s Î S} .

The meaning of the directives is as follows:

_i means go up via an edge labeled by i, i.e., move to the parent, provided the current node is the ith child of its parent. This directive can be used to check which child of its parent the current node is.
¯_i means go down via an edge labeled by i, i.e., move to the ith child of the current node.
root checks whether the current node is the root of the tree.
Øroot is the negation of root.
lab_s checks whether the current node is labeled by s.

Note that the directive root corresponds to the directive head of the string-walking automaton. A directive ``leaf'' (corresponding to tail) is not necessary since the automaton can test whether the label of the current node has rank 0. Also, Øroot can be simulated by _i ¯_i for all i Î rki(S). To formalize the behaviour of these directives we define for each tree t Î T_S and each directive d Î D_TWA(S) the following binary relation R_t(d) on V_t:

R_t(_i)

{ (u,v) \mid (v,i,u) Î E_t }

R_t(¯_i)

{ (u,v) \mid (u,i,v) Î E_t }

R_t(root)

{ (u,u) \mid u = root_t }

R_t(Øroot)

{ (u,u) \mid u ¹ root_t }

R_t(lab_s)

{ (u,u) \mid lab_t(u) = s}

Like the definition of string-walking automata, a tree-walking automaton over S is a finite automaton A over D_TWA(S). For a tree-walking automaton A = (Q,D_TWA(S),d,I,F) and a tree t Î T_S, an element (q,u) of Q ×V_t is a configuration of the automaton. The set of all configurations of A and t is denoted by \mathbb C_A,t. A configuration (q,u) denotes that the automaton A is in state q at node u.

Let A = (Q,D_TWA(S),d,I,F) be a tree-walking automaton over S and let t Î T_S. Like with string-walking automata, we define a step-relation \twoheadrightarrow_A,t on the set of configurations as follows. For every (q,u), (q¢,u¢) Î \mathbb C_A,t,

(q,u) \twoheadrightarrow_A,t (q¢,u¢) iff $d Î D_TWA(S): (q,d,q¢) Î d and (u,u¢) Î R_t(d) .

The automaton A computes on each tree t Î T_S the binary relation

R_t(A) = { (u,u¢) Î V_t ×V_t \mid(q_in,u) \twoheadrightarrow_A,t^* (q_fin,u¢) for some q_in Î I and q_fin Î F } .

The node relation that A computes is

R(A) = { (t,u,v) \mid t Î T_S and (u,v) Î R_t(A) } .

The definitions of these relations (\twoheadrightarrow_A,t, R_t(A) and R(A)) are mutatis mutandis equal to the definitions for string-walking automaton in Section 1.2.

The tree language that A recognizes is defined as

L(A) = { t Î T_S \mid(root_t, v) Î R_t(A) for some v Î V_t } .

Note that we can assume that the walk of the automaton ends as well as begins in root_t, since we can easily add extra transitions to make the automaton walk to the root at the end of its walk. The class of all tree languages recognized by any tree-walking automaton is named TWA. It is an open problem if TWA=REGT.

Like in the case of string-walking automata, we will also allow transitions of the form (p,s,q) with s Î D_TWA(S)^*. If s = d₁ ¼d_n, we define, for all t Î T_S,

R_t(s) = R_t(d₁) °¼°R_t(d_n).

As with string-walking automata, these extended directives can easily be implemented by adding extra transitions and states.

Determinism of tree-walking automata in defined in analogy with determinism of string-walking automata. A tree-walking automaton A = (Q,D_TWA(S),d,I,F) is deterministic if the following conditions hold:

#I = 1.
If (p,d,q) Î d, then p \not Î F.
For all distinct transitions (p,d₁,q₁), (p,d₂,q₂) Î d, d₁ and d₂ are two mutually exclusive directives in D_TWA(S).

Two directives d₁,d₂ Î D_TWA(S) are again mutually exclusive if, for all t Î T_S, dom(R_t(d₁)) Çdom(R_t(d₂)) = Æ. The following pairs of directives in D_TWA(S) are mutually exclusive:

{_i, _j} with i ¹ j.
{_i,root} for i Î rki(S)
{root,Øroot}
{ lab_s₁, lab_s₂ } with s₁ ¹ s₂
{ ¯_i, lab_s} if i > rk(s).

3.3 Push-Down Tree-Walking Automata

A push-down tree-walking automaton walks on a tree, starting in the root of the tree. It has a push-down store. At each step of the automaton's walk, its possible moves are determined by the automaton's state, by the symbol at the current position of the automaton and by the push-down symbol at the top of the store. If the automaton moves up in the tree, the top symbol is removed from the push-down store; if the automaton moves down, a new symbol is added to the push-down store. There are two main differences between the push-down tree-walking automata as described in this section and those in literature (e.g., [KS81]). Here, directives are used to describe the different possible actions of the automaton. Furthermore, [KS81] lets the automaton accept a tree if it walks up from the root, leaving the tree in a final state; in the description here, the walk can end in any node of the tree in a final state. These differences have no consequences for the power of the automaton. Note that the tree-walking automaton in [KS81] is the push-down tree-walking automaton with only one push-down symbol.

We will present the push-down tree-walking automaton as an extension of the tree-walking automaton. Let S be a ranked alphabet and let G be an alphabet. The directives of a push-down tree-walking automaton are

D_PDTWA(S,G)

{ } È{ ¯_i,g \mid i Î rki(S), g Î G} È

{ lab_s\mid s Î S} È{ stay_{g₁, g₂} \mid g₁, g₂ Î G} .

The meaning of these directives is as follows:

means go up to the parent of the current node and remove the topmost symbol from the push-down store. Note that the subscript i is not necessary for this automaton. As the automaton traverses the tree, routing information can be placed in the push-down store.
¯_i,g means go down via an edge labeled by i and add g to the push-down store.
lab_s checks whether the current node is labeled by s.
stay_{g, g¢} checks whether the topmost symbol from the push-down store is g and, if so, replaces it by g¢.

The push-down tree-walking automaton does not need the directives root and Øroot. By using marked push-down symbols on the bottom of the push-down store, the automaton can check whether the current node is the root of the tree by inspecting the top symbol of the push-down store. If convenient, however, we will use the directives root and Øroot since a push-down tree-walking automaton with these directives can be simulated by a push-down tree-walking automaton without root and Øroot.

To formalize the directives, we define for each tree t Î T_S and each directive d Î D_PDTWA(S,G) the following binary relation R_t(d) on pairs (u,b) where u Î V_t is the current node, and b = g₁ ¼g_n Î G^* is the current contents of the push-down store with n equal to the number of ancestors of u.

R_t()

{ ( (u, g₁ ¼g_n),(v, g₁ ¼g_n-1) ) \mid v is the parent of u }

R_t(¯_i,g)

{ ( (u, g₁ ¼g_n),(v, g₁ ¼g_n g)) \midv is the ith child of u }

R_t(lab_s)

{ ((u, g₁ ¼g_n),(u, g₁ ¼g_n)) \midlab_t(u) = s}

R_t(stay_g,g¢)

{ ((u, g₁ ¼g_n-1 g_n),(u, g₁ ¼g_n-1 g¢)) \mid g_n = g}

Let S be a ranked alphabet (of node labels) and let G be an alphabet (of push-down symbols) with a designated element g_in (the initial push-down symbol). A push-down tree-walking automaton over (S,G) is a finite automaton A over D_PDTWA(S,G). For a push-down tree-walking automaton A = (Q,D_PDTWA(S,G),d,I,F) over (S,G) and a tree t Î T_S, the configurations of A are triples (q,u,b), with q Î Q the state of the automaton, u Î V_t the current node, and b = g₁ ¼g_n Î G^* the contents of the push-down store, where n is equal to the number of ancestors of u. The set of all configurations on A and t is denoted by \mathbb C_A,t.

Let A = (Q,D_PDTWA(S,G),d,I,F) be a push-down tree-walking automaton over (S,G) and let t Î T_S. The definition of the step-relation \twoheadrightarrow_A,t is almost identical to the definition of the step-relation for tree-walking automata. For every (q,u,b), (q¢,u¢,b¢) Î \mathbb C_A,t,

(q,u,b) \twoheadrightarrow_A,t (q¢,u¢,b¢) iff $d Î D_PDTWA(S,G): (q,d,q¢) Î d and ((u,b),(u¢,b¢)) Î R_t(d) .

The tree language that A recognizes is defined as

L(A) = { t Î T_S \mid(q_in,root_t,g_in) \twoheadrightarrow_A,t^* (q_fin,u,b) for some q_in Î I, q_fin Î F, u Î V_t and b Î G^* } .

The automaton starts in the root of a tree with only the initial push-down symbol g_in on the push-down store. It recognizes the tree if it reaches a final state in any node and with any contents of the push-down store. The class of all tree languages recognized by any push-down tree-walking automaton is named PDTWA. Note that because of the push-down store there is no reasonable definition for a binary node relation for push-down tree-walking automata.

Transitions of the form (p,s,q) with s Î D_TWA(S)^* are treated in the same way as with tree-walking automata. Determinism of push-down tree-walking automata is defined in the same way as for tree-walking automata. The following pairs of directives in D_PDTWA(S,G) are mutually exclusive:

{lab_s₁, lab_s₂ } with s₁ ¹ s₂
{stay_g₁,g₁¢, stay_g₂,g₂¢} with g₁ ¹ g₂.

3.4 Equivalence of Finite Tree Automata and
Push-Down Tree-Walking Automata

In this section we show the known result [KS81] that push-down tree-walking automata recognize exactly the regular tree languages. First we show that for every finite tree automaton there exists a push-down tree-walking automaton that recognizes the same language; then we show that, the other way around, for every push-down tree-walking automaton there exists a finite tree automaton that recognizes the same language.

Lemma 16 For every finite tree automaton M there exists a push-down tree-walking automaton A such that L(A) = L(M).

Proof. Let M = (Q,S,d,F) be a finite tree automaton. We construct a push-down tree-walking automaton A = (Q¢,D_PDTWA(S,G),d¢,{q_in},{q_fin}) over (S,G) such that L(A) = L(M). The automaton A walks on a tree t in a single pass from left to right. It enters each subtree of t with root u in state q_in and leaves it in state state_M,t(u). It uses the pushdown store to remember, (a) how many children of u have already been processed and (b) what the resulting states were. Therefore we define Q¢ = Q È{q_in,q_fin} (with q_in,q_fin\not Î Q), G = È{ Qⁱ \mid i Î rki(S) } È{l} and g_in = l. The pushdown symbol l (the empty string) indicates that no subtrees have been traversed yet. If the automaton is at node u and the pushdown symbol at the top of the pushdown store is q₁ ¼q_k, this means that the first k subtrees of u have been traversed and that, for every i with 1 £ i £ k, state_M,t(u ·i) = q_i.

The transition relation d¢ is as follows:

d¢

{ (q_in, (Øroot, lab_s, ), d_s) \mids Î S, rk(s) = 0 } È

{ (q_in, (root, lab_s), q_fin) \mids Î S, rk(s) = 0, d_s Î F} È

{ (q_in, (lab_s, ¯_1,l), q_in) \mids Î S, rk(s) > 0 } È

{ (q_k, (lab_s, stay_{q₁ ¼q_k-1, q₁ ¼q_k}, ¯_k+1,l), q_in) \mid

s Î S, "j Î [1,k] (q_j Î Q), k < rk(s) } È

{ (q_k, (Øroot, lab_s, stay_{q₁ ¼q_k-1, q₁ ¼q_k-1}, ), d_s(q₁, ¼, q_k)) \mid

s Î S, "j Î [1,k] (q_j Î Q), k = rk(s) > 0 } È

{ (q_k, (root, lab_s, stay_{q₁ ¼q_k-1, q₁ ¼q_k-1}), q_fin) \mid

s Î S, "j Î [1,k] (q_j Î Q), k = rk(s) > 0, d_s(q₁, ¼, q_k) Î F }

\square

Lemma 17 For every push-down tree-walking automaton A there exists a finite tree automaton M such that L(M) = L(A).

Proof. Let A = (Q,D_PDTWA(S,G),d,I,F) be a push-down tree-walking automaton over (S,G). From this automaton, we first construct another push-down tree walking automaton A¢. A¢ is equal in behaviour to A, except that when A is in a final state, A¢ has extra transitions to move up to the root of the tree and then further up ``out'' of the tree in an extra state q_fin. Note that A¢ can never execute the latter instruction; it is added only to make the construction of M easier. We define A¢ = (Q¢,D_PDTWA(S,G),d¢,I,{q_fin}) with Q¢ = Q È{q_fin}, q_fin\not Î Q and d¢ = dÈ{(q,,q_fin) \mid q Î F} È{ (q_fin, , q_fin) }. We now construct a finite tree automaton M = (Q_M,S,d_M,F_M) such that L(M) = L(A). We use the method of transition tables (see, e.g., [KS81]). In this method, the states of the finite tree automaton are relations R Í (G×Q¢) ×Q¢. When the finite tree automaton computes a relation R as state for a node u with ((g,q),q¢) Î R , this means that the push-down tree-walking automaton A¢ can traverse the subtree with u as root, starting at node u in state q with symbol g on the top of the push-down store, and emerging from the subtree (by an edge from u to its parent) in state q¢, with g removed from the push-down store. At the root, A¢ ``emerging'' from the tree in state q_fin means that A arrived in a final state.

Formally, we define

Q_M

2^{(G×Q¢) ×Q¢}

d_M

{ d_s \mid s Î S}, with d_s (s Î S) defined below

F_M

{ R Í (G×Q¢) ×Q¢\mid ((g_in,q_in),q_fin) Î R for some q_in Î I }

Here, for every s Î S, the transition function for symbol s is d_s: Q_M^rk(s) ® Q_M with

d_s(R₁, ¼, R_n)

{ ((g, q), q¢¢) \mid g Î G, q,q¢¢ Î Q¢ and (g,q) R(s, R₁, ¼, R_n)^* (g¢, q¢)

for some (g¢, q¢) Î G×Q¢ with (q¢, , q¢¢) Î d¢} .

In this definition, R(s, R₁, ¼, R_n) is the binary relation on G×Q¢ that contains the results of the possible actions of the push-down tree-walking automaton A¢ while staying on the same node or entering a subtree and emerging again from it. The symbol s is the symbol at the current position and the relations R_i Í (G×Q¢) ×Q¢ are the transition tables of the children of the current node. The relation R is formally defined as

R(s, R₁, ¼, R_n)

{ ((g,q),(g,q¢¢)) \mid(q, ¯_i,g¢, q¢) Î d¢ and ((g¢,q¢),q¢¢) Î R_i

for some i Î [1,rk(s)] } È

{ ((g,q),(g¢,q¢)) \mid(q, stay_g,g¢, q¢) Î d¢} È

{ ((g,q), (g,q¢)) \mid(q, lab_s, q¢) Î d¢} .

Note that these relations can effectively be computed. The computation of R(s, R₁, ¼, R_n) is straightforward. Since it is a binary relation on a finite set, its reflexive transition closure can be computed by regarding the relation as a graph and computing all paths in the graph (the nodes of the graph are the elements of G×Q¢ and there is an edge from (g₁,q₁) to (g₂,q₂) iff ((g₁,q₁),(g₂,q₂)) is in the relation). \square

Combining these two lemmas yields the following theorem.

Theorem 18 PDTWA = REGT

3.5 Tree-Walking Marble Automata

Another way to extend tree-walking automata is with marbles . The concept of marbles is the same as that of Section 2.3 for strings, but they are used here in a different way. Marbles are coloured (with symbols from an alphabet) and there are infinitely many marbles of each colour available. Like pebbles, the automaton can drop a marble at a certain node, check for its presence any later time and lift it again. It is not allowed to put more than one marble of the same colour on the same node. Note that by taking the powerset of the marble alphabet we can even pose that not more than one marble (of any colour) is on any node. In the rest of this paper we will implicitly assume that this is the case. Unlike pebbles, putting down a marble on a certain node restrains the automaton to the subtree of which that node is the root, until the marble is lifted again. This restriction makes sure that the marbles are ``almost'' nested: unnested use of marbles is only possible if the marbles are at the same node. We will show that this restriction makes the tree-walking marble automaton very similar to the push-down tree-walking automaton, while it allows for the definition of the binary relation computed by a tree-walking marble automaton on a tree, in a natural way.

Let S be a ranked alphabet and let G be an alphabet of marble labels. Note that there is no g_in necessary now. For a tree-walking marble automaton over (S,G), we define the set of directives as

D_TWMA(S,G) = D_TWA(S)È{ put_g, lift_g, here_g, Øhere_g \midg Î G} .

The meaning of the extra directives is as follows:

put_g means drop a marble with label (or ``colour'') g on the current node, if one is not yet there.
lift_g means pick up a marble with label g, if one is here.
here_g means check whether there is a marble with label g at the current node.
Øhere_g is the negation of here_g.

For t Î T_S, the position of the marbles is formalized by a function m:G® 2^V_t. For this function, m(g) = S Í V_t means that exactly the nodes in S have a marble with label g. The set of all marble position functions is denoted by M_t,G = G® 2^V_t. The behaviour of the directives, including the restrictions mentioned above, is formalized by defining for each tree t Î T_S and each directive d Î D_TWMA(S,G) the following binary relation R_t(d) on V_t ×M_t,G.

R_t(_i)

{ ((u,m), (u¢,m)) \midu is the ith child of u¢ andfor all g Î G, u \not Î m(g) }

R_t(¯_i)

{ ((u,m),(u¢,m)) \midu¢ is the ith child of u }

R_t(root)

{ ((root_t,m),(root_t,m)) }

R_t(Øroot)

{ ((u,m),(u,m)) \mid u ¹ root_t }

R_t(lab_s)

{ ((u,m),(u,m)) \midlab_t(u) = s}

R_t(put_g)

{ ((u,m),(u,m¢)) \midu \not Î m(g), m¢ = m(g® m(g) È{u}) }

R_t(lift_g)

{ ((u,m),(u,m¢)) \midu Î m(g), m¢ = m(g® m(g) \{u}) }

R_t(here_g)

{ ((u,m),(u,m)) \mid u Î m(g) }

R_t(Øhere_g)

{ ((u,m),(u,m)) \mid u \not Î m(g) }

The relation for d = _i makes sure that the marbles are used nested, since it only allows moving up if there are no marbles on the current node.

Let S be a ranked alphabet and let G be an alphabet. A tree-walking marble automaton over (S,G) is a finite automaton A over D_TWMA(S,G). For a tree-walking marble automaton A = (Q,D_TWMA(S,G),d,I,F) over (S,G) and a tree t Î T_S, the configurations of A are triples (q,u,m) Î Q ×V_t ×M_t,G, with q the state of the automaton, u the current node and m the marble position function. The set of all configurations of A and t is denoted by \mathbb C_A,t.

Let A = (Q,D_TWMA(S,G),d,I,F) be a tree-walking marble automaton over (S,G) and let t Î T_S. As usual, we define the binary step-relation \twoheadrightarrow_A,t on \mathbb C_A,t. For every (q,u,m), (q¢,u¢,m¢) Î \mathbb C_A,t,

(q,u,m) \twoheadrightarrow_A,t (q¢,u¢,m¢) iff $d Î D_TWMA(S,G):(q,d,q¢) Î d and (u,m) R_t(d) (u¢,m¢) .

The automaton A computes on each tree t Î T_S the binary relation

R_t(A) = { (u,u¢) Î V_t ×V_t \mid(q_in,u,m₀) \twoheadrightarrow_A,t^* (q_fin,u¢,m₀) for some q_in Î Iand q_fin Î F } ,

where m₀:G® 2^V_t with m₀(g) = Æ for all g Î G. Note that this definition demands that A removes all marbles from the tree. The node relation that A computes is, as usual,

R(A) = { (t,u,v) \mid t Î T_S and (u,v) Î R_t(A) } .

The tree language that A recognizes is also defined in the familiar way:

L(A) = { t Î T_S \mid (t,root_t,v) Î R(A) for some v Î V_t} .

The class of all tree languages recognized by a tree-walking marble automaton is named TWMA. Transitions of the form (q,s,q¢) with s Î D_TWMA(S,G)^* are treated in the usual way. Determinism of tree-walking marble automata is defined in the same way as of (simple) tree-walking automata. The following pairs of directives in D_TWMA(S,G) are mutually exclusive:

{_i, _j} with i ¹ j.
{_i, root}.
{root, Øroot}.
{lab_s₁, lab_s₂} with s₁ ¹ s₂.
{¯_i, lab_s} with i > rk(s).
{_i, here_g} and {_i, lift_g} with i Î rki(S) and g Î G (if there is any marble, the automaton is not allowed to move up).
{put_g, lift_g}, {put_g,here_g} and {lift_g,Øhere_g} with g Î G.

We define the abbreviation Øhere_G for Øhere_g₁, ¼, Øhere_{g_k} with G = {g₁, ¼, g_k}. These directives together test that there is no marble on the current node.

Note that root and Øroot are needed for the tree-walking marble automaton because, unlike the push-down tree-walking automaton, a tree-walking marble automaton does not have to start its walk at the root of the tree, so it is not always possible to place a special marble at the root. Likewise, directives _i for i Î rki(S) are needed because, when the automaton starts in a certain node not equal to the root, the path from the root to that node cannot be coded into marbles.

Theorem 19 TWMA = REGT

Proof. Since PDTWA=REGT (Theorem 18), it is sufficient to show that TWMA=PDTWA. First we prove that any push-down tree walking automaton can be simulated by a tree-walking marble automaton. Let A = (Q,D_PDTWA(S,G),d,I,F) be a push-down tree-walking automaton over (S,G). We define a tree-walking marble automaton

A¢ = (QÈ{q_in,q_fin},D_TWMA(S,G),d¢,{q_in},{q_fin})

with a new initial state q_in and a new final state q_fin. This automaton simulates A in the following way. First, a marble with colour g_in, representing the initial push-down symbol, is put on the current node. Then the directive ¯_i,g is simulated by moving down to the ith child of the current node and then putting down a marble of colour g. On the way up, the directive is simulated by picking up any marble (there is always exactly one on the current node) and moving up to the parent of the current node. The directive lab_s is also available in D_TWMA(S,G), so it can be simulated directly. When A reaches a final state, A¢ moves up to the root, picking up all the marbles on the way up, so that it accepts the tree when it picks up the last marble at the root. A configuration of the push-down tree-walking automaton (u,g₁ ¼g_n) is simulated by a configuration (u,m) of the tree-walking marble automaton, where, for all i Î [1,n], m has a marble g_i on the ith node of the path from the root to u, and no other marbles. A¢ has the following transitions:

d¢

{ (q_in, put_{g_in}, q) \mid q Î I } È

{ (q,(lift_g, _i),q¢) \mid (q,,q¢) Î d, g Î G, i Î rki(S) } È

{ (q,(¯_i,put_g),q¢) \mid (q,¯_i,g,q¢) Î d} È

{ (q,lab_s,q¢) \mid(q,lab_s,q¢) Î d} È

{ (q, (lift_g₁,put_g₂),q¢) \mid(q,stay_{g₁, g₂},q¢) Î d} È

{ (q, lift_g, q_fin) \mid q Î F, g Î G} È

{ (q_fin, (_i, lift_g), q_fin) \mid i Î rki(S), g Î G}.

Obviously, L(A¢) = L(A).

The other way around, let A = (Q,D_TWMA(S,G),d,I,F) be a tree-walking marble automaton over (S,G). As observed before, we may assume that the automaton is constructed in such a way that, at any time, there is at most one marble on any node. Note that, due to the restriction of the automaton to the subtree of the current node whenever a marble is put down, at any moment all marbles lie on the path from the root to the current node. This path can be viewed as a push-down store, except that here ``empty'' places in the store are allowed. To account for these empty places, we add an extra marble colour g₀. In order to simulate root, Øroot and the directives _i (for i Î rki(S)), extra information is put on the push-down store. The new push-down symbols are of the form (g, i, r), where g Î G is the marble colour (or g = g₀ if there is no marble on the current node), the current node is the ith child of its father, and r = 1 if the current node is the root and r = 0 if the current node is not the root. The initial pushdown symbol is g_in = (g₀,1,1). Now the push-down store can be used to simulate the marbles. We define a push-down tree-walking automaton A¢ = (Q¢,D_PDTWA(S,G¢),d¢,I,F¢).

When A moves down (¯_i), A¢ also moves down and puts the symbol (g₀,i,0) on the push-down store, which means that there is no marble on the (new) current node, the current node is the ith child of its father and the current node is not the root.
When A moves up (_i), A¢ checks the symbol on top of the push-down store to make sure that the child number is correct and that there is no marble on the current node.
A¢ can simulate root and Øroot by checking the symbol on top of the push-down store, (g,i,r) with r = 1 for root and r = 0 for Øroot.
The directive lab_s can be simulated directly.
The directive put_g can be simulated by first checking that there is no marble on the current node, so the symbol on top of the push-down store is (g₀,i,r) and then replacing g₀ by g.
To simulate lift_g, A¢ checks that there is a marble on the current node with colour g and then replaces it with ``colour'' g₀.
The directive here_g is simulated by inspecting the top symbol of the push-down store and making sure that it is of the form (g,i,r). Similarly for Øhere_g and a top symbol of any of the forms (g¢,i,r) with g¢ ¹ g.
When A reaches a final state, A¢ must make sure that there are no marbles left on the tree before it can accept the tree. Because all marbles are on the path from the current node to the root, A¢ walks up to the root, checking the absence of marbles on the way, and enter the final state q_fin when it arrives at the root and there is no marble there either.

Formally, the push-down alphabet and the transitions of A¢ are defined as follows.

G¢

(GÈ{g₀}) ×rki(S) ×{0,1} with g_in = (g₀,1,1)

Q¢

Q È{up, q_fin}

F¢

{ q_fin}

d¢

{(q,(stay_{(g₀,i,r),(g₀,i,r)}, ),q¢) \mid r Î {0,1}, (q, _i, q¢) Î d} È

{ (q, ¯_{i, (g₀,i,0)},q¢) \mid(q,¯_i,q¢) Î d} È

{ (q, stay_{(g,i,1),(g,i,1)}, q¢) \midg Î GÈ{g₀}, i Î rki(S),(q,root,q¢) Î d} È

{ (q, stay_{(g,i,0),(g,i,0)}, q¢) \midg Î GÈ{g₀}, i Î rki(S),(q,Øroot,q¢) Î d} È

{ (q, lab_s, q¢) \mid(q,lab_s,q¢) Î d} È

{ (q, stay_{(g₀,i,r),(g,i,r)}, q¢) \midi Î rki(S), r Î {0,1}, (q,put_g,q¢) Î d} È

{ (q, stay_{(g,i,r),(g₀,i,r)}, q¢) \midi Î rki(S), r Î {0,1}, (q,lift_g,q¢) Î d} È

{ (q, stay_{(g,i,r),(g,i,r)}, q¢) \midi Î rki(S), r Î {0,1}, (q,here_g,q¢) Î d} È

{ (q, stay_{(g,i,r),(g,i,r)}, q¢) \midi Î rki(S), r Î {0,1}, g Î (GÈ{g₀}),

(q,Øhere_g¢,q¢) Î d for some g¢ ¹ g} È

{ (q, stay_{(g₀,i,1), (g₀,i,1)}, q_fin) \midq Î F, i Î rki(S) } È

{ (q, (stay_{(g₀,i,0), (g₀,i,0)}, ), up) \midq Î F, i Î rki(S) } È

{ (up, (stay_{(g₀,i,0), (g₀,i,0)}, ), up) \midi Î rki(S) } È

{ (up, stay_{(g₀,i,1), (g₀,i,1)}, q_fin) \midi Î rki(S) }

Since this automaton simulates A, we have L(A¢) = L(A). \square

One might wonder why it is necessary to restrict a tree-walking automaton, when it puts down a marble on a node, to the subtree of which that node is the root. We will show that just demanding that the marbles be used nested is not enough; this would allow the automaton to recognize non-regular tree languages. As an example, consider the non-regular tree language L = { t_n \mid n Î \mathbb N} over the ranked alphabet S = {a,b,c} with rk(a) = rk(b) = 1 and rk(c) = 0, with, for all n Î \mathbb N,

V_{t_n}

{x_i,y_i \mid i Î [1,n]} È{z}

E_{t_n}

{ (x_i,1,x_i+1), (y_i,1,y_i+1) \mid i Î [1,n-1] } È{(x_n,1,y₁), (y_n,1,z) }

lab_{t_n}

{(x_i,a), (y_i,b) \mid i Î [1,n] } È{(z,c)}

This tree language L is similar to the (non-regular) language { aⁿ bⁿ c \mid n Î \mathbb N}. If we only demand the nested use of marbles, L is recognized by the tree-walking marble automaton A over (S,{g}), that puts the first marble on the first ``a'', walks to the bottom of the tree, checks whether there is a ``c'' there and a ``b'' directly above, puts a marble on that ``b'', walks up the tree to the first ``a'' again, puts a marble on the second ``a'' and continues walking up and down until it reaches the middle, where it can check if there are as much ``a'''s as ``b'''s. It then removes the marbles in the reverse order, walking up and down the tree in a similar way.

Note that tree-walking multi-pebble automata, defined in analogy to the string-walking multi-pebble automata of Section 2.1, recognize only regular tree languages (but it is open whether or not they recognize them all). Multi-pebble automata have only a finite number of pebbles available, contrary to the infinite number of marbles of marble automata. This means the above described automaton will not work with a multi-pebble automaton.

3.6 Tree-Walking Marble/Pebble Automata

There is one feature to be added to tree-walking marble automata to make sure that they have the same power as binary MSO formulas with respect to describing binary node relations. A tree-walking marble/pebble automaton has one pebble (next to marbles) which it can use in the usual way. The only restriction is that the use of marbles and pebble together must be nested, e.g., it is not allowed to put down a marble, then the pebble and then pick up the marble before the pebble. Unlike the marbles, the pebble does not restrict the automaton to the subtree induced by all descendants of the node where the pebble is. The pebble will be used to mark one special node, e.g., the starting node of a walk on the tree. Like with tree-walking marble automata, we will always assume that the automaton is constructed in such a way that there is not more than one marble on any node, at any time.

Let S be a ranked alphabet and let G be an alphabet of marble colours. We define the set of directives for a tree-walking marble/pebble automaton as

D_TWMPA(S,G)

D_TWMA(S,G)È{ put,lift,here,Øhere}

{ _i, ¯_i \mid i Î rki(S) } È{ root, Øroot} È{ lab_s\mid s Î S} È

{ put_g, lift_g, here_g, Øhere_g \midg Î G} È{ put,lift,here,Øhere}

Here put, lift, here and Øhere (without subscripts) are the directives for the pebble, while put_g, lift_g, here_g and Øhere_g are the directives for a marble of colour g. To enforce nesting, we now define the binary relations for the directives on triples (u,k,p) Î V_t ×\mathbb N ×(\mathbb N ® MP_t) with MP_t = (V_t ×G) ÈV_t È{^}. Here an element m of MP_t represents the action of putting down a marble of colour g on node u (m = (u,g)) or of putting down the pebble on node u (m = u), or no action (m = ^). Of a triple (u,k,p), u is the current node of the automaton, k is the number of marbles and pebbles currently on the tree and p is the marble/pebble placement function. Intuitively, p(1), ¼, p(k) are the actions executed in the past (in that order) of putting down marbles or the pebble. The function p is used as a kind of push-down stack, putting another item on the stack if a marble or pebble is put down and removing the topmost item from the stack if a marble or pebble is lifted. At all times, p(n) = ^ if n > k.

For each tree t Î T_S and each directive d Î D_TWMPA(S,G) we define the following binary relations:

R_t(_i)

{ ((u,k,p), (u¢,k,p)) \midu is the ith child of u¢ and

p(j) ¹ (u,g) for all j Î [1,k], g Î G}

R_t(¯_i)

{ ((u,k,p),(u¢,k,p)) \midu¢ is the ith child of u }

R_t(root)

{ ((root_t,k,p),(root_t,k,p)) }

R_t(Øroot)

{ ((u,k,p),(u,k,p)) \mid u ¹ root_t }

R_t(lab_s)

{ ((u,k,p),(u,k,p)) \midlab_t(u) = s}

R_t(put_g)

{ ((u,k,p), (u,k+1,p¢)) \midp¢ = p(k+1 ® (u,g)),

p(i) ¹ (u,g) for all i Î [1,k] }

R_t(lift_g)

{ ((u,k,p), (u,k-1,p¢)) \midk ³ 1, p(k) = (u,g), p¢ = p(k® ^) }

R_t(here_g)

{ ((u,k,p), (u,k,p)) \midp(i) = (u,g) for some i Î [1,k] }

R_t(Øhere_g)

{ ((u,k,p), (u,k,p)) \midp(i) ¹ (u,g) for all i Î [1,k] }

R_t(put)

{ ((u,k,p), (u,k+1,p¢)) \midk ³ 0, p¢ = p(k+1 ® u),

p(i) Î MP_t \V_t for all i Î [1,k] }

R_t(lift)

{ ((u,k,p), (u,k-1,p¢)) \midk ³ 1, p(k) = u, p¢ = p(k ® ^) }

R_t(here)

{ ((u,k,p), (u,k,p)) \midp(i) = u for some i Î [1,k] }

R_t(Øhere)

{ ((u,k,p), (u,k,p)) \midp(i) ¹ u for all i Î [1,k] }

Let S be a ranked alphabet and let G be an alphabet. A tree-walking marble/pebble automaton over (S,G) is a finite automaton over D_TWMPA(S,G). We continue with the usual definitions. For a tree-walking marble/pebble automaton A = (Q,D_TWMPA(S,G),d,I,F) over (S,G) and a tree t Î T_S, the configurations of A are 4-tuples (q,u,k,p) Î Q ×V_t ×\mathbb N ×(\mathbb N ® MP_t), with q the state, u the current node, k the number of marbles and pebbles currently on the tree and p the marble/pebble placement function. The set of all configurations of A and t is denoted by \mathbb C_A,t. We define the binary step-relation \twoheadrightarrow_A,t on \mathbb C_A,t in the usual way. For every (q,u,k,p),(q¢,u¢,k¢,p¢) Î \mathbb C_A,t,

(q,u,k,p) \twoheadrightarrow_A,t (q¢,u¢,k¢,p¢) iff $d Î D_TWMPA(S,G): (q,d,q¢) Î d and (u,k,p) R_t(d) (u¢,k¢,p¢) .

The automaton A computes on each tree t Î T_S the binary relation

R_t(A) = { (u,u¢) Î V_t ×V_t \mid (q_in,u,0,p₀) \twoheadrightarrow_A,t^* (q_fin,u¢,0,p₀) for some q_in Î I, q_fin Î F } ,

where p₀ is the empty marble/pebble placement function, with p₀(i) = ^ for all i Î \mathbb N. The node relation that A computes is, as usual,

R(A) = { (t,u,v) \mid t Î T_S and (u,v) Î R_t(A) } .

The tree language that A recognizes is, also as usual,

L(A) = { t Î T_S \mid (t,root_t,v) Î R(A) for some v Î V_t } .

The class of all languages that are recognized by a tree-walking marble/pebble automaton is named TWMPA. Transitions of the form (q,s,q¢) with s Î D_TWMPA(S,G)^* are treated in the usual way. The definition of deterministic tree-walking marble/pebble automata is equal to the definition of deterministic string-walking automata. The following pairs of directives in D_TWMPA(S,G) are mutually exclusive, next to those pairs already mutually exclusive for tree-walking marble automata:

{put, lift}, {put,here} and {lift,Øhere}
{lift, lift_g} for any g Î G (since either the pebble or a marble was put down last and hence must be lifted first).

Theorem 20 TWMPA = REGT

Proof. Because TWMA=REGT (Theorem 19) and because each tree-walking marble automaton is also a tree-walking marble/pebble automaton, it is sufficient to show that, for each tree-walking marble/pebble automaton A, there exists a tree-walking marble automaton A¢ such that L(A¢) = L(A).

Let A = (Q,D_TWMPA(S,G),d,I,F) be a tree-walking marble/pebble automaton over (S,G). We may again assume that A is constructed in such a way that, at any time, there is at most one marble on any node. We also assume that A nevers puts the pebble on an empty tree, i.e., there is always at least one marble on the tree when A puts down the pebble. This can easily be accomplished by adding an extra marble colour and having A put one on the root before starting. This also allows us to assume that A does not use the directives root and Øroot.

We first formulate and prove the following Claim. This Claim states that for a situation where A just put down its pebble on the current node u and the previous marble (of colour g) was put on an ancestor v of u, there exists a finite tree automaton M such that A can make a round-trip, starting and finishing at u, starting in state q and finishing in state q¢, with the same marbles on the tree at the beginning and at the end of the round-trip (and without lifting the pebble at u), if and only if M accepts t_v (the subtree of t with root v), marked at node u. Due to the nesting constraints, the round-trip of A takes place only on t_v (A cannot move higher up the tree than v before lifting the marble at u).

[: Claim] Let A = (Q,D_TWMPA(S,G),d,I,F) be a tree-walking marble/pebble automaton over (S,G) that does not make use of the directives put, lift, root, and Øroot. Then there exists for every g Î G and q,q¢ Î Q a finite tree automaton M over SÈ(S×B₁) such that, for all t Î T_S, u,v Î V_t, k ³ 2, p Î \mathbb N ® MP_t with p(k) = u and p(k-1) = (v,g), with u Î des(v) and the only marble on t_v is at v (i.e., "i < k: if p(i) = (v¢,g¢) then v¢\not Î des(v)),

(q,u,k,p) \twoheadrightarrow_A,t^* (q¢,u,k,p) iff mark(t_v,u) Î L(M).

To prove this Claim, let A = (Q,D_TWMPA(S,G),d,I,F) be a tree-walking marble/pebble automaton over (S,G) that does not make use of the directives put, lift, root, and Øroot, and let g_r Î G (the g in the Claim) and q₁,q₂ Î Q. We first define a tree-walking marble automaton A¢ = (Q¢,D_TWMA(SÈ(S×B₁),G),d¢,I¢,F¢) over (SÈ(S×B₁),G) such that the Claim holds for A¢ rather than M. The automaton A¢ uses the mark to know the position of the pebble. It walks on t_v; there is no marble of colour g_r on v. A¢ starts in the root of t_v, i.e., v. It first searches for the pebble, then simulates A, starting in state q₁. When it gets back to the pebble in state q₂, A¢'s mission is accomplished.

Q¢

Q È{q_in, q_fin}

I¢

{ q_in}

F¢

{ q_fin}

d¢

{ (q_in, ¯_i, q_in) \mid i Î rki(S) } È{ (q_in, lab_(s,1), q₁) \mid s Î S} È

{ (p,d,q) \mid (p,d,q) Î d, d \not Î {here, Øhere, Øhere_{g_r}, put_{g_r} }} È

{ (p,lab_(s,1),q) \mid (p,lab_s,q) Î d} È

{ (p,lab_(s,1),q) \mid (p,here,q) Î d,s Î S} } È

{ (p,lab_s,q) \mid (p,Øhere,q) Î d,s Î S} È

{ (p,root,q) \mid (p,here_{g_r},q) Î d,s Î S} È

{ (p,(Øroot, Øhere_{g_r}),q) \mid (p,Øhere_{g_r},q) Î d} È

{ (p,(Øroot, put_{g_r}),q) \mid (p,put_{g_r},q) Î d} È

{ (q₂, lab_(s,1), q_fin) \mids Î S}

With this definition of A¢, we have, for all t Î T_S, u,v Î V_t, k ³ 2, p Î \mathbb N ® MP_t with p(k) = u and p(k-1) = (v,g), with u Î des(v) and the only marble on t_v at v,

(q₁,u,k,p) \twoheadrightarrow_A,t^* (q₂,u,k,p) iff mark(t_v,u) Î L(A¢).

Because TWMA=REGT (Theorem 19), there exists a finite tree automaton M over SÈ(S×B₁) with L(M) = L(A¢). The automaton M complies with the demands of the Claim. This ends the proof of the Claim.

We continue with the proof of Theorem 20. We construct a tree-walking marble automaton A¢ that recognizes the same tree language as the tree-walking marble/pebble automaton A. The automaton A¢ = (Q¢,D_TWMA(S,G¢),d¢,I¢,F¢) simulates A, but at each step of A, A¢ is prepared to simulate the use of the pebble of A, by knowing, if it were to put the pebble at the current node u, the states in which A could return at the same node, ready to pick up the pebble again (cf. the proof of Theorem 11). To accomplish this, A¢ uses a special set of marbles in addition to the marbles of A. For each q,q¢ Î Q and g Î G, M_{q q¢g} is the automaton that can be constructed according to the Claim, after removing all transitions with directives put and lift from A. The meaning of the additional marble colours is as follows.

The marbles [^(g)] (for g Î G) denote that the last marble that A has put down is of colour g. The special marble e is used to indicate that there are no marbles of A on the tree.
The marbles in the set ``states'' are strings of functions from Q ×Q to Q. If the string is f₁ ¼f_k (with k the rank of the symbol s of the node), then f_i(q, q¢) = p means that M_{q q¢g} reaches state p in the ith child of the node. Here g is the colour of the latest marble put down by A on the tree.
The marbles q Î Q denote that A will continue its walk on the tree from there in state q. They are used to store the state of A during the computation of the states-marble. They also mark the node where this computation started.
The marbles in the set ``succ'' are functions from Q ×Q to subsets of Q. A marble coloured g:Q ×Q ® 2^Q on a node u indicates that, for all q,q¢ Î Q, the finite automaton M_{q q¢g} has successful states g(q,q¢) at u. Here M_{q q¢g} works on the subtree t_v, where v is the node where the latest marble, of colour g, was put by A.

A¢ uses a method similar to the one used in Lemma 16 to compute the states of the M_{q q¢g} and thus the marble colours for the marbles from the set ``states''. The successful states of the M_{q q¢g} for the children of a node u are computed by A¢ from the successful states of u itself and from the states that M_{q q¢g} reaches in the children of u. When A puts down its pebble (at node u), A¢ checks whether M_{q q¢g} accepts mark(t_v,u), where q is the state of A after it put down its pebble, q¢ is a state from which A can pick it up again, g is the colour of the last marble that A put down before the pebble and v is the position of that marble. The colour g is stored in the marble [^(g)] at the current node u. During the direct simulation of the steps of A, there is always exactly one marble from the set {[^(g)] \mid g Î G} È{e} at the current node. When there is at least one marble of A on the simulated tree, there is also exactly one marble from each of the sets states and succ at the current node. The colours of these three marbles determine whether or not the M_{q q¢g} accept mark(t_v,u). We define the abbreviation Øhere_Q to be Øhere_q₁, ¼, Øhere_{q_n} with {q₁, ¼, q_n} = Q.

Formally, the states, marble alphabet, initial and final states, and transitions of A¢ are defined as follows.

Q¢

Q È{ q_in, q_fin} È{ findstates_g \mid g Î G} È{ (f,g) \mid f:(Q ×Q) ® Q, g Î G}

G¢

GÈQ È{

^
g

\mid g Î G} È{e}ÈstatesÈsucc

states

{ l} È

È
i Î rki(S)

((Q ×Q) ® Q)ⁱ

succ

(Q ×Q) ® 2^Q

I¢

{ q_in}

F¢

{ q_fin}

d¢

d_in Èd_sim Èd_states Èd_fin

The set of transitions d¢ of A¢ is divided into four parts. The initialization of A¢ is done in the transitions in d_in. The actual simulation of the steps of A is done by the transitions in d_sim. The set d_states contains a ``subroutine'' for calculating the states of M_{q₁ q₂ g} for the children of the current node in a marble from the set states. Finally, the cleaning up is done by the transitions in d_fin.

We start with describing d_in. From the initial state q_in of A¢, a marble e is laid on the root. This marble indicates that there is no marble on the (simulated) tree yet. This transition prepares the automaton A¢ for the simulation of A.

d_in = { (q_in, put_e, q) \mid q Î I }

In d_sim we place the transitions for the actual simulation of the steps of A. This is quite technical. After simulating any of the directives {¯_i, put_g, lift_g} the marble from states must be re-computed. The marbles from succ are computed with the use of a function ``su'' that is defined after d_sim. In the last set of transitions, the putting down and lifting of the pebble is simulated, by making use of the information in the marbles from states and succ.

d_sim

{ (p, (here_e,¯_i,put_e, q) \mid(p, ¯_i, q) Î d} È

{ (p, (lab_s, here_[^(g)], here_{f₁¼f_k}, here_g, ¯_i, put_[^(g)], put_g¢, put_q, put_l), findstates_g) \mid (p, ¯_i, q) Î d,

s Î S, k = rk(s), g Î G, i Î [1,k], f_j:Q×Q ® Q (j Î [1,k]),

g:Q×Q ® 2^Q,g¢ = su(g, s, g, i, f₁, ¼, f_k) } È

{ (p, (lift_e, _i), q) \mid(p, _i, q) Î d} È

{ (p, (Øhere_G, lift_[^(g)], lift_g, lift_{f₁¼f_k}, _i), q) \mid (p, _i, q) Î d,

g Î G, g:Q×Q ® 2^Q, k ³ 0,f_j:Q×Q ® Q (j Î [1,k]) } È

{ (p, lab_s, q) \mid (p, lab_s, q) Î d} È

{ (p, (put_g, lift_e, put_[^(g)],put_g, put_q, put_l), findstates_g) \mid (p, put_g, q) Î d,

g:Q×Q ® 2^Q with g(q₁,q₂) = F_{M_{q₁ q₂ g}} } È

{ (p, (put_g, lift_[^(g)]¢, put_[^(g)],lift_{f₁ ¼f_k}, lift_g,put_g¢, put_q, put_l), findstates_g) \mid (p, put_g, q) Î d,

g¢ Î G, k ³ 0, f_j:Q×Q ® Q (j Î [1,k]),g:Q×Q ® 2^Q,

g¢:Q×Q ® 2^Q with g¢(q₁,q₂) = F_{M_{q₁ q₂ g}} } È

{ (p, (lift_g, lift_[^(g)], lift_{f₁ ¼f_k},lift_g, _i, here_e, ¯_i,put_e), q)\mid (p, lift_g, q) Î d,

k ³ 0, f_j:Q×Q ® Q (j Î [1,k]),g:Q×Q ® 2^Q, i Î rki(S) } È

{ (p, (lift_g, lift_[^(g)], lift_{f₁ ¼f_k},lift_g, _i, here_[^(g)]¢, lab_s, here_{f¢¢₁ ¼f¢¢_k¢¢}, here_g¢¢,¯_i,

put_[^(g)]¢, put_g¢, put_q, put_l), findstates_g¢) \mid (p, lift_g, q) Î d, k ³ 0,

f_j:Q×Q ® Q (j Î [1,k]),g:Q×Q ® 2^Q, i Î rki(S), g¢ Î G,s Î S,

k¢¢ = rk(s),f¢¢_j:Q×Q ® Q (j Î [1,k¢¢]), g¢¢:Q×Q ® 2^Q,

g¢:Q×Q ® 2^Q, g¢ = su(g¢¢,s,g¢,i,f¢¢₁, ¼, f¢¢_k¢¢) } È

{ (p, (root, lift_g, lift_[^(g)], lift_{f₁ ¼f_k},lift_g, put_e), q) \mid (p, lift_g, q) Î d,

k ³ 0, f_j:Q×Q ® Q (j Î [1,k]),g:Q ×Q ® 2^Q } È

{ (p, here_g, q) \mid (p, here_g, q) Î d} È

{ (p, Øhere_g, q) \mid (p, Øhere_g, q) Î d} È

{ (p, (lab_s, here_{f₁ ¼f_k}, here_g, here_[^(g)]), q) \mid (p, put, p¢) Î d, (q¢, lift, q) Î d,

s Î S, k = rk(s), g Î G,

g:Q×Q ® 2^Q,f_j:Q×Q ® Q (j Î [1,k]),

(d_(s,1))_{M_p¢q¢g}(f₁(p¢, q¢), ¼, f_k(p¢, q¢)) Î g(p¢, q¢) }

Here su(g, s, g, i, f₁, ¼, f_k) computes the next set of successful states. For all states q₁, q₂ Î Q, su(g, s, g, i, f₁, ¼, f_k)(q₁,q₂) is the set of all states q such that M_{q₁ q₂ g} is successful, assuming it reaches the ith child of the current node in state q, and the jth (j ¹ i) child of the current node in state f_j(q₁, q₂), or

su(g, s, g, i, f₁, ¼, f_k)(q₁,q₂)

{q Î Q_{M_{q₁ q₂ g}} \mid (d_s)_{M_{q₁ q₂ g}}(f₁(q₁, q₂), ¼, f_i-1(q₁, q₂), q,

f_i+1(q₁, q₂), ¼, f_k(q₁, q₂)) Î g(q₁, q₂) } .

The ``subroutine'' to construct the marble from states is as follows: d_states = È_{g Î G} d_g with

d_g

{ (findstates_g, (Øhere_Q, lab_s, lift_l, _i), (f,g)) \midi Î rki(S), rk(s) = 0,

f:(Q ×Q) ® Q with f(q₁,q₂) = (d_s)_{M_{q₁ q₂ g}} } È

{ (findstates_g, (lift_q, lab_s), q) \midrk(s) = 0, q Î Q }

{ (findstates_g, (lab_s, ¯₁, put_l), findstates_g) \midrk(s) > 0 } È

{ ((f_k,g), (lab_s, lift_{f₁¼f_k-1}, put_{f₁¼f_k}, ¯_k+1, put_l), findstates_g) \mid1 £ k < rk(s) } È

{ ((f_k,g), (Øhere_Q, lab_s, lift_{f₁¼f_k-1}, _i), (f,g)) \mid k = rk(s), i Î rki(S),

f(q₁, q₂) = (d_s)_{M_{q₁ q₂ g}}(f₁(q₁, q₂), ¼,f_k(q₁, q₂)) } È

{ ((f_k,g), (lift_q, lab_s, lift_{f₁¼f_k-1}, put_{f₁¼f_k}), q) \midq Î Q, k = rk(s) ³ 1}

Finally, to allow A¢ to accept the tree, A¢ has to remove its auxiliary pebbles from the tree. Since at a moment that A would accept the tree there are no marbles on it, A¢ only has to remove the marbles e.

d_fin = { (q, lift_e, q_fin) \mid q Î F } È{ (q_fin, (_i, lift_e), q_fin) \mid i Î rki(S) }

\square

3.7 Tree-Walking Automata with MSO Tests

A completely different extension (i.e., not involving marbles or pebbles) of tree-walking automata is to add the capability to perform MSO tests (see [BE97], Section 1.4). Let S be a ranked alphabet. The set of directives for a tree-walking automaton with MSO tests is

D_TWA+M(S) = { _i, ¯_i \mid i Î rki(S) } ÈMSOL₁(S) .

As this is an infinite set, we have to choose a finite subset of D_TWA+M(S) for each tree-walking automaton with MSO tests. For each tree t Î T_S and each directive d Î D_TWA+M(S) we define the following binary relation on V_t:

R_t(_i)

{ (u,v) \mid (v,i,u) Î E_t }

R_t(¯_i)

{ (u,v) \mid (u,i,v) Î E_t }

R_t(y(x))

{ (u,u) \mid (t,u) \models y(x) }

A tree-walking automaton with MSO tests over S is a finite automaton A over a finite subset of D_TWA+M(S). The definitions of configurations, \twoheadrightarrow_A,t, R_t(A), R(A) and L(A) are unchanged. The definition of deterministic tree-walking automata with MSO tests is also unchanged with respect to the definition of deterministic tree-walking automata, although it is no longer possible to list all pairs of mutually exclusive directives. However, it is decidable whether a given pair of directives is mutually exclusive. Directives of the form (q,s,q¢) with s Î D_TWA+M(S)^* are treated in the usual way. The class of all tree languages L such that L = L(A) for some tree-walking automaton with MSO tests A is named TWA+M. As stated in Section 1.4, in [BE97] the following is proven.

Proposition 21 The following three statements hold:

TWA+M=REGT
For each ranked alphabet S and each binary MSO formula f(x,y) Î MSOL₂(S), there exists a tree-walking automaton with MSO tests A such that R(A) = R(f).
For each ranked alphabet S and each tree-walking automaton with MSO tests A over S, there exists a binary MSO formula f(x,y) Î MSOL₂(S) such that R(f) = R(A).

3.8 Equivalence of Binary MSO Formulas and
Tree-Walking Marble/Pebble Automata

In this section we will show that binary MSO formulas, i.e., formulas f(x,y) Î MSOL₂(S) for some ranked alphabet S, define the same node relations as those computed by tree-walking marble/pebble automata. First we will show that for each binary formula f(x,y) Î MSOL₂(S) there exists a tree-walking marble/pebble automaton A over (S, G) (for some alphabet G) such that they compute the same relation, that is, R(f) = R(A). We will make use of the second part of Proposition 21 to construct our tree-walking marble/pebble automaton, similar to the alternate proof of Theorem 7 (Section 1.6). Note that the original proof of Theorem 7 is not valid for trees, because, if this proof were translated to trees, the pebble and marbles would not be used nested. We need the following lemma. This lemma states that it is possible to simulate a unary MSO formula with a tree-walking marble/pebble automaton that makes a round-trip, checking the absence of all marbles and pebbles before entering a final state.

Lemma 22 Let S be a ranked alphabet. For each unary MSO formula y(x) Î MSOL₁(S) there exists a tree-walking marble/pebble automaton A = (Q,D_TWMPA(S,G),d,I,F) over (S, G), for some alphabet G, such that, for all t Î T_S, R_t(A) = { (u,u) \mid u Î V_t, (t,u) \models y(x) } and, if (q_in,u,0,p₀) \twoheadrightarrow_A,t^* (q_fin,u¢,k,p) for some u,u¢ Î V_t, q_in Î I and q_fin Î F, then k = 0 and p = p₀.

Proof. Let S be a ranked alphabet and let y(x) Î MSOL₁(S) be a unary MSO formula. According to Corollary 3, there exists a closed MSO formula y¢ Î MSOL₀(SÈ(S×B₁)) such that, for all t Î T_S and u Î V_t, (t,u) \models y(x) iff mark(t,u) \models y¢. Since MSOT=REGT (Proposition 15) and TWMA=REGT (Theorem 19), there exists a tree-walking marble automaton

A¢ = (Q¢,D_TWMA(SÈ(S×B₁),G),d¢,I¢,F¢)

over (SÈ(S×B₁), G) (for some alphabet G) such that L(A¢) = L(y¢). This implies that for all t Î T_S, u Î V_t and t¢ = mark(t,u),

t¢\models y¢ iff (q_in,root_t¢,m₀) \twoheadrightarrow_A¢,t¢^* (q_fin,v,m₀) for some q_in Î I¢, q_fin Î F¢ and v Î V_t .

We now define the tree-walking marble/pebble automaton A = (Q,D_TWMPA(S,G),d,I,F). This automaton first drops its pebble in order to remember the start of the walk. Then it walks to the root of the tree and starts simulating A¢. It uses the pebble to determine when to interpret a node label s Î S as s (when there is no pebble at the current node) and when to interpret s as (s,1) (when there is a pebble at the current node). When it reaches a final state, it walks to the root of the tree, checking on the way that there are no marbles (this can be done since the marbles always lie on the path from the root to the current node), so that A¢ really accepts the marked tree. Then A searches for the pebble, picks it up and finishes.

Q¢È{ q_in, toroot, ready, findpebble, q_fin}

{ q_in}

{ q_fin}

{ (q_in,put,toroot) } È{ (toroot, _i, toroot) \mid i Î rki(S) } È

{ (toroot, root, q) \mid q Î I¢} È

{ (q,d,q¢) \mid (q,d,q¢) Î d¢, d ¹ lab_s for all s Î S } È

{ (q,(Øhere,lab_s),q¢) \mid (q,lab_s,q¢) Î d¢} È

{ (q,(here,lab_s),q¢) \mid(q,lab_(s,1),q¢) Î d¢} È

{ (q, Øhere_G, ready) \mid q Î F¢} È

{ (ready, (_i, Øhere_G), ready) \mid i Î rki(S) } È

{ (ready, (root, Øhere_G), findpebble) } È

{ (findpebble, (Øhere, ¯_i), findpebble) \mid i Î rki(S) } È

{ (findpebble, (here, lift), q_fin) }

Note that the construction of A ensures that the marbles and pebble are used nested, because first the pebble is put down, then the marbles are used in the same way as A¢ uses them (i.e., nested), then A makes sure there are no marbles left on the tree and finally A picks up the pebble.

From the construction of A it follows that, for all t Î T_S, u Î V_t and t¢ = mark(t,u),

(q,root_t¢,m₀) \twoheadrightarrow_A¢,t¢^* (q¢,v,m₀) for some q Î I¢, q¢ Î F¢ and v Î V_t iff

(q_in,u,0,p₀) \twoheadrightarrow_A,t^* (q_fin,u,0,p₀).

This, and the details of the construction of A, implies R_t(A) = { (u,u) \mid u Î V_t, (t,u) \models y(x) } and the other details of the lemma. \square

Lemma 23 Let S be a ranked alphabet and let f(x,y) Î MSOL₂(S) be a binary MSO formula. Then there exists a tree-walking marble/pebble automaton A such that R(A) = R(f).

Proof. According to Proposition 21, there exists a tree-walking automaton with MSO tests A¢ = (Q¢,D, d¢,I¢,F¢) such that R(A¢) = R(f). We define a tree-walking marble/pebble automaton A over (S,G) that simulates A¢. The automaton A can directly simulate the directives {_i, ¯_i \mid i Î rki(S)}. The directives y(x) Î MSOL₁(S) are handled by constructing a tree-walking marble/pebble automaton according to Lemma 22 and using this automaton as a ``subroutine'' in A. We define T = {(q,d,q¢) Î d¢\mid d Î MSOL₁(S)}, the set of all transitions with MSO tests in A¢. For each transition t = (q,y(x),q¢) Î T, we use Lemma 22 to construct the tree-walking marble/pebble automaton A_t over (S, G_t) that simulates t. We will assume that the states of any automaton A_t do not overlap with the states of any other A_t or with the states of A¢. We also assume that the marble alphabets G_t are disjoint, or G_t ÇG_t¢ = Æ for all t,t¢ Î T, t ¹ t¢. The marble alphabet of A is G = È_{t Î T} G_t.

We define A = (Q,D_TWMPA(S,G),d,I,F) as follows:

Q¢È

È
t Î T

Q_{A_t}

I¢

F¢

{ t Î d¢\mid t\not Î T } È

{ (q, (put,lift), q¢), (q¢¢, (put,lift), q¢¢¢) \midq¢ Î I_{A_t}, q¢¢ Î F_{A_t},t = (q,y(x),q¢¢¢) Î T } È

È
t Î T

d_{A_t}

Here the (double) directive (put,lift) acts as a ``nop'' directive. It can easily be seen that A complies with the demands of nesting and that R(A) = R(A¢). \square

We now show the other way around.

Lemma 24 Let S be a ranked alphabet and let A be a tree-walking marble/pebble automaton over (S,G). Then there exists a binary MSO formula f(x,y) Î MSOL₂(S) such that R(f) = R(A).

Proof. From the automaton A we first construct another tree-walking marble/pebble automaton A¢ over (SÈ(S×B₂), G) such that, for all t Î T_S and u,v Î V_t,

mark(t,u,v) Î L(A¢) iff (t,u,v) Î R(A) .

A¢ first finds the node with label (s,1,b), simulates A and accepts the tree if A has a final state in a node with label (s,b,1). Because TWMPA=REGT (Theorem 20) and REGT=MSOT (Proposition 15), there exists a MSO formula y Î MSOL₀(SÈ(S×B₂)) such that, for all t¢ Î T_SÈ(S×B₂),

t¢ Î L(y) iff t¢ Î L(A¢) .

Now, using Corollary 3, we obtain a binary MSO formula f(x,y) Î MSOL₂(S) such that, for all t Î T_S and u,v Î V_t,

(t,u,v) Î R(f) iff mark(t,u,v) Î L(y)

and we have, for all t Î T_S and u,v Î V_t,

(t,u,v) Î R(f) iff (t,u,v) Î R(A) .

\square

These two lemmas together prove the grand finale of this chapter.

Theorem 25 The following two statements hold:

For every MSO formula f(x,y) Î MSOL₂(S) there exists a tree-walking marble/pebble automaton A over (S, G), for some alphabet G, such that R(A) = R(f).
For every tree-walking marble/pebble automaton A over (S, G) there exists an MSO formula f(x,y) Î MSOL₂(S) such that R(f) = R(A).

This theorem states that binary MSO formulas define the same node relations that tree-walking marble/pebble automata compute. Since the proof of this theorem, and all underlying theorems and lemmas, is constructive, we can actually construct an operational automaton with only local operations to compute the node relation that is defined by a given, descriptive, binary MSO formula, and vice versa.

Conclusion and Recommendations

Conclusion

In this paper we have described a number of string-walking and tree-walking automata and we have proved some of their properties, especially with regard to the binary node relations these automata compute. This study has been done in order to find an answer to the following question:

Is it possible to define a type of tree-walking automaton, with only local operations, that computes the same binary node relations that binary MSO formulas recognize?

In Chapter 1 we have investigated the possibilities to define a string-walking automaton that computes the same binary relations on the positions of strings as binary MSO formulas. It was found that string-walking pebble automata have this property. It is even possible, for every binary MSO formula, to construct a string-walking pebble automaton that computes the same binary relation. The reverse is also true: for each string-walking pebble automaton, it is possible to construct a binary MSO formula that defines the same binary relation. It was also shown that the pebble is necessary. There is a binary MSO formula for which there is no string-walking automaton (without pebble) that computes the same binary relation. Since the pebble operations are local (the pebble can only be handled on the current position of the automaton), this answers our central question for strings.

To answer our central question for trees, we have defined in Chapter 3, among others, tree-walking pebble/marble automata. It has been shown that these automata compute exactly the binary node relations that can be defined by binary MSO formulas. Like with string-walking pebble automata, the appropriate automaton for each binary MSO formula can be constructed, and vice versa. Tree-walking marble/pebble automata have also only local operations. All pebble-handling and marble-handling occurs only on the current node. A note must be made that it may seem a bit complicated to have both marbles and a pebble available to the automaton. We have made no study to check whether it is possible to leave out, e.g., the pebble and still compute the same binary node relations. Also one might argue that the tree-walking marble/pebble automaton is not strictly local, since the automaton ``remembers'' in some way that it has dropped the pebble, further down the tree, when it is about to pick up the marble that was put down last before the pebble.

Recommendations

There are a number of questions that have not been answered in this paper, but may be interesting for future study.

Only binary node relations have been considered. Maybe there is also an operational way to compute the k-ary node relations that are defined by k-ary MSO formulas.
The concept of string-walking marble automata (Section 1.4) may have to be adapted to make them recognize exactly the regular languages.
Do tree-walking automata recognize exactly the regular tree languages?
It has not been checked that tree-walking marble/pebble automata need the pebble to compute the relations defined by binary MSO formulas.
Tree-walking multi-pebble automata (in analogy with string-walking multi-pebble automata) have not been studied. It is not known which tree languages they describe, and whether the number of pebbles induces a hierarchy on the recognized languages. It is known that tree-walking multi-pebble automata recognize at most the regular tree languages, or TWnPA Í REGT.
It should be checked whether deterministic string-walking pebble automata recognize exactly the functional relations that are defined by binary MSO formulas. Also for deterministic tree-walking marble/pebble automata and (functional) binary MSO formulas.

Bibliography

[BE97]: R. Bloem and J. Engelfriet. Characterization of properties and relations defined in monadic second order logic on the nodes of trees. Technical Report 97-04, Leiden University, 1997.
[BH65]: M. Blum and C. Hewitt. Automata on a 2-dimensional tape. In Proc. 8th IEEE Symp. on Switching and Automata Theory, pages 155-160, 1965.
[Büc60]: J. Büchi. Weak second-order arithmetic and finite automata. Z. Math. Logik Grundl. Math., 6:66-92, 1960.
[Don70]: J. Doner. Tree acceptors and some of their applications. J. of Comp. Syst. Sci., 4:406-451, 1970.
[GS84]: F. Gécseg and M. Steinby. Tree Automata. Akadémiai Kiadó, Budapest, 1984.
[HU79]: J.E. Hopcroft and J.D. Ullman. Introduction to Automata Theory, Lamnguages, and Computation. Addison-Wesley, Reading, Mass., 1979.
[KS81]: T. Kamimura and G. Slutzki. Parallel and two-way automata on directed ordered acyclic graphs. Inform. Contr., 49:10-51, 1981.
[RHE91]: G. Rozenberg, H.J. Hoogeboom, and J. Engelfriet. Formele talen en automaten 1. Afdeling Wiskunde en Informatica, Rijksuniversiteit Leiden, 1991. In Dutch.
[TW68]: J.W. Thatcher and J.B. Wright. Generalized finite automata theory with an application to a decision problem of second-order logic. Math. Systems Theory, 2:57-81, 1968.

File translated from T_EX by T_TH, version 2.32.
On 9 Jul 1999, 16:00.

Back to publications index

Tree-Walking Automata and Monadic Second Order Logic

Jan-Pascal van Best

Preface

Contents

Introduction

Chapter 1 Binary MSO Formulas and String-Walking Pebble Automata

1.1 Preliminaries

Strings

Finite Automata

Monadic Second Order Logic (on strings)

1.2 String-Walking Automata

1.3 String-Walking Pebble Automata

1.4 String-Walking Automata with MSO Tests

1.5 From MSO Formulas to String-Walking Pebble Automata

1.6 Another proof

1.7 Pebble Necessity

1.8 From String-Walking Pebble Automata to MSO Formulas

Chapter 2 Extensions to String-Walking Pebble Automata

2.1 String-Walking Multi-Pebble Automata

2.2 From String-Walking Multi-Pebble Automata to MSO Formulas

2.3 String-Walking Marble Automata

Chapter 3 Binary MSO Formulas and Tree-Walking Automata

3.1 Preliminaries

Trees

Finite Tree Automata

Monadic Second Order Logic (on trees)

3.2 Tree-Walking Automata

3.3 Push-Down Tree-Walking Automata

3.4 Equivalence of Finite Tree Automata and Push-Down Tree-Walking Automata

3.5 Tree-Walking Marble Automata

3.6 Tree-Walking Marble/Pebble Automata

3.7 Tree-Walking Automata with MSO Tests

3.8 Equivalence of Binary MSO Formulas and Tree-Walking Marble/Pebble Automata

Conclusion and Recommendations

Conclusion

Recommendations

Bibliography

Chapter 1
Binary MSO Formulas and String-Walking Pebble Automata

1.5 From MSO Formulas to
String-Walking Pebble Automata

1.8 From String-Walking Pebble Automata to
MSO Formulas

Chapter 2
Extensions to
String-Walking Pebble Automata

2.2 From String-Walking Multi-Pebble Automata to
MSO Formulas

Chapter 3
Binary MSO Formulas and Tree-Walking Automata

3.4 Equivalence of Finite Tree Automata and
Push-Down Tree-Walking Automata

3.8 Equivalence of Binary MSO Formulas and
Tree-Walking Marble/Pebble Automata