Manual

  1. Introduction
  2. Defining mtts
  3. Defining TL programs
  4. Defining the output type
  5. Type checking a program

Introduction

The OCaml program mttchk type checks macro tree transducers or TL programs with respect to a predefined output type. TL is a rule-based transformation language that was invented in [Maneth et al., 2005]. It subsumes the key features of the usual XML transformation languages and still allows exact type checking. It has

  • match patterns for selecting rules,
  • select patterns determining where the transfoprmation continues,
  • named functions, which allow to devide the whole transformation into components, and
  • accumulating parameters to refer to context information.
Here, we present a simplified version of TL that can be type checked in a very elegant and efficient way.

The illustrating example can be found in [Maneth et al., 2005]. The input document lists all incoming e-mails in the Inbox element. Element Trash is meant to collect all deleted mails. Besides the normal Mail elements, Inbox also contains e-mails inside a Spam element. This indicates that these mails have been identified as spam, e.g., by some automated mail filter. The transformation cleans-up the Inbox by moving all spam mails into the Trash element.

Defining mtts

An mtt is defined in a plain text file looking almost as described in the definition. The following example illustrates how to define the rules:
q_init(Doc(r,l))      -> Doc(q_inbx(r),e())
q_inbx(Inbox(r,l))    -> Inbox(q_mail(r),q_trsh1(l,q_spam(r)))
q_trsh1(Trash(r,l),y) -> Trash(q_trsh2(r,y),e())
q_mail(Mail(r,l))     -> Mail(e(),q_mail(l))
q_mail(Spam(r,l))     -> q_mail(l)
q_mail(e())           -> e()
q_spam(Mail(r,l))     -> q_spam(l)
q_spam(Spam(r,l))     -> Spam(e(),q_spam(l))
q_spam(e())           -> e()
q_trsh2(Mail(r,l),y)  -> Mail(e(),q_trsh2(l,y))
q_trsh2(Spam(r,l),y)  -> Spam(e(),q_trsh2(l,y))
q_trsh2(e(),y)        -> y

Each line defines the transformation rule for exactly one function. The first line, for example, defines a rule for function q_init. Since it is the first rule, q_init is the initial function of the defined mtt, meaning that transforming the input starts at its root node with this function.

The left-hand side of a rule defines at which labels the rule can be applied and the number and names of the additional parameters. Function q_trsh1 in the third rule has one parameter named y. The matching label can be seen in the first argument of each function, i.e., the initial rule transformes the root node of the input, if it is labled with Doc. The variables r and l denote the sons of the current node.

A right-hand side describes how to transform the current node and how to continue the transformation. It is assembled of constant parts like Doc or e in the first rule and recursive function calls like q_inbx(r). A function is always called for one of the input variables of the left-hand side, here r. If the function has additonal parameters, their positions can also be filled with constant output part or recursive calls to other functions. This can be seen in the second rule, where q_trsh1 is called for the left child of the current node and the parameter contains the call to function q_spam.

Defining TL programs

A TL program is defined in a plain text file like the following one:
q("Doc") -> Doc(q_inbx("Inbox"),q_trsh("Trash",p_inbx("Inbox")));
q_inbx("Inbox") -> Inbox(q1("Mail"));
p_inbx("Inbox") -> p1("Spam/Mail");
q1("Mail") -> Mail();
p1("Mail") -> Mail();

This example defines the same transformation as the mtt above and illustrates in its shortness, that TL is a very compact language to define transformations. TL has a different view on its input, because it can handle an arbitrary number of sons of a node -- whereas mtts work on ranked trees.

The first line again defines the initial rule, here for function q. The left-hand side specifies in the first argument the match pattern, defining at which nodes the rule can be applied. In our example, q transformes nodes labeled with Doc.

The right-hand sides specify the action part of each rule. The first rule generates a Doc node in the output, whose content is generated by the functions q_inbx and q_trsh. The latter function has one parameter containing the result of a recursive call to another function. Each recursive call has a select pattern determining which nodes have to be transformed by this function. A select pattern is a path from the current node to the nodes that should be transformed. In the third rule, for example, the select pattern Spam/Mail specifies, that each Mail node under a Spam node is transformed by function p1. This means, the rule selects all mails marked as spam in the inbox.

Defining the output type

The socalled output type defines all unwanted outputs, because the type checking routine tests whether the intersection of the computed output type of the transformation with the predefined type is empty. If the intersection is empty, the transformation generates only those documents intended by the programmer. For our mtt example, this type looks as follows:
pM,Mail,pL,pL;
pM,Mail,pL,pM;
Error,Mail,pL,pS;
Error,Mail,pL,Error;
pS,Spam,pL,pL;
pS,Spam,pL,pM;
pS,Spam,pL,pS;
pT,Trash,pL,pL;
pT,Trash,pM,pL;
pT,Trash,pS,pL;
Error,Inbox,pS,pT;
Error,Inbox,Error,pT;
pD,Doc,pI,pL;
Error,Doc,Error,pL;
pL,e;.
Error
The definition is partitioned into two parts: a rule set and after the . the acepting states. Each line of the rule set defines a transition of the automaton accepting the set of documents. The first line, for example, has to be read in the following way: The automaton changes into state pM if the automaton is in state pL for the left son and in state pM for the right son, and the current node is labeled with Mail. This is a straight forward notation for the transitions of a tree automaton.

Type checking a program or mtt

If one wants to check the above mtt against the defined type, one has to type

mttchk -top -mtt remove_spam.mtt -out mailbox.dfa

where the -mtt option defines the mtt and -out the defined type. Analogously, the TL program is checked in the following way:

mttchk -top -tl mail.tl -out test.dfa

The suffixes .tl, .mtt, and .dfa help only to distinguish the different files, but are completely useless for the program.
back to project page