# Learning CUP by example

## The inevitable calculator

This example should give You a basic idea of how to start working with CUP. A minimal CUP specification looks like this:

```/* Simple +/-/* expression language; parser evaluates constant expressions on the fly*/
import java_cup.runtime.*;

parser code {:
// Connect this parser to a scanner!
scanner s;
Parser(scanner s){ this.s=s; }
:}

/* define how to connect to the scanner! */
init with {: s.init(); :};
scan with {: return s.next_token(); :};

/* Terminals (tokens returned by the scanner). */
terminal            SEMI, PLUS, MINUS, TIMES, UMINUS, LPAREN, RPAREN;
terminal Integer    NUMBER;        // our scanner provides numbers as integers

/* Non terminals */
non terminal            expr_list;
non terminal Integer    expr;      // used to store evaluated subexpressions

/* Precedences */
precedence left PLUS, MINUS;
precedence left TIMES;
precedence left UMINUS;

/* The grammar rules */
expr_list ::= expr_list expr:e SEMI         {: System.out.println(e);:}
| expr:e SEMI                   {: System.out.println(e);:}
;
expr      ::= expr:e1 PLUS  expr:e2         {: RESULT = e1+e2;       :}
| expr:e1 MINUS expr:e2        {: RESULT = e1-e2;       :}
| expr:e1 TIMES expr:e2        {: RESULT = e1*e2;       :}
| MINUS expr:e                 {: RESULT = -e;          :}
%prec UMINUS
| LPAREN expr:e RPAREN	         {: RESULT = e;           :}
| NUMBER:n	                     {: RESULT = n;           :}
;
```

Based on this file, You just need to create a Java Specification from this with `java -jar java-cup-11b.jar -interface -parser Parser calc.cup` . Additionally, You need a basic scanner to produce tokens, which You then can process with your parser. For once, we show, how to handcode such a scanner, reading symbols from the imput:

```import java_cup.runtime.*;

class Main {
public static void main(String[] argv) throws Exception{
Parser p = new Parser(new scanner());
p.parse();
}
}

public class scanner {
protected static int next_char;

/* we use a SymbolFactory to generate Symbols */
private SymbolFactory sf = new DefaultSymbolFactory();

/* advance input by one character */

/* initialize the scanner */
public static void init() throws java.io.IOException        { advance(); }

/* recognize and return the next complete token */
public Symbol next_token() throws java.io.IOException
{
for (;;)
switch (next_char)
{
case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9':
/* parse a decimal integer */
int i_val = 0;
do {
i_val = i_val * 10 + (next_char - '0');
} while (next_char >= '0' && next_char <= '9');
return sf.newSymbol("NUMBER",sym.NUMBER, new Integer(i_val));

case -1: return sf.newSymbol("EOF",sym.EOF);

default:
/* in this simple scanner we just ignore everything else */
break;
}
}
};
```

Compiling all classes with `javac -cp java-cup-11b-runtime.jar:. *.java` creates your calculator, that you can call via `java -cp ../../dist/java-cup-11b-runtime.jar:. Main`

## MiniJava - a toy language

MiniJava is a small artificial toy language, only here to illustrate the basic steps in creating a compiler with CUP. This language has a specification via a grammar ("MiniJava"). It essentially knows branches, loops, declarations and basic i/o. The target is a toy assembler ("MiniJVM"), for which we have a simulator. Typical input in this language looks as follows:

```    int x, y;
y = 10;
write(x);
if (x>=0)
write (y);
else
write (x);
while (1-y <= -x) {
y = x * (-y + 5);
}
```

A typical usage for syntax trees is generation of target code. This can be either done manually with creating syntax tree classes in action codes for each production alternative and devising a mechansim for traversing this tree, e.g. with recursive function calls or visitor patterns.

CUP however offers the possibility to create an XML document for your syntaxtree automatically, which can then be processed with an XSLT/XQuery processor like baseX or saxon, even for generating target code. You can find an example for such an approach in this section.

### Parsing automatically straight to XML

We start with a direct conversion of the grammar into a CUP-parser specification. We generate the Java based parser via `java -jar java-cup-11b.jar -locations -interface -parser Parser -xmlactions minijava.cup`

```/* Minijava Grammar */
import java_cup.runtime.ComplexSymbolFactory;
import java_cup.runtime.ScannerBuffer;
import java_cup.runtime.XMLElement;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamWriter;
import java.io.*;

import javax.xml.transform.*;
import javax.xml.transform.stream.*;
parser code {:
public Parser(Lexer lex, ComplexSymbolFactory sf) {
super(lex,sf);
}
public static void main(String[] args) throws Exception {
// initialize the symbol factory
ComplexSymbolFactory csf = new ComplexSymbolFactory();
// create a buffering scanner wrapper
// start parsing
Parser p = new Parser(lexer,csf);
XMLElement e = (XMLElement)p.parse().value;
// create XML output file
XMLOutputFactory outFactory = XMLOutputFactory.newInstance();
XMLStreamWriter sw = outFactory.createXMLStreamWriter(new FileOutputStream(args[1]));
// dump XML output to the file
XMLElement.dump(lexer,sw,e,"expr","stmt");

// transform the parse tree into an AST and a rendered HTML version
Transformer transformer = TransformerFactory.newInstance()
.newTransformer(new StreamSource(new File("tree.xsl")));
Source text = new StreamSource(new File(args[1]));
transformer.transform(text, new StreamResult(new File("output.xml")));
transformer = TransformerFactory.newInstance()
.newTransformer(new StreamSource(new File("tree-view.xsl")));
text = new StreamSource(new File("output.xml"));
transformer.transform(text, new StreamResult(new File("ast.html")));
}
:};

terminal SEMICOLON, COMMA, LPAR, RPAR, BEGIN, END, IF, ELSE, WHILE, READ, WRITE, BUNOP, ASSIGN;
terminal TYPE, BINOP, UNOP, COMP, BBINOP, INTCONST;
terminal IDENT,STRINGCONST;
terminal BOOLCONST;

non terminal program, decllist,decl,stmtlist,identlist,stmt,expr,cond;

precedence left ELSE, UNOP, BINOP, BUNOP, BBINOP;

program   ::=  decllist:d stmtlist:s
;
decllist  ::=  decl:d decllist:dl
| /* empty */
;
stmtlist  ::= stmtlist:sl stmt:s
| /* empty */
;
decl ::= TYPE IDENT:identifier identlist:il  SEMICOLON
;
identlist  ::= identlist:il COMMA IDENT:identifier
| /* empty */
;
stmt ::= SEMICOLON
| BEGIN stmtlist:sl END
| IDENT:lhs ASSIGN expr:rhs SEMICOLON
| IDENT:lhs ASSIGN READ LPAR RPAR SEMICOLON
| IDENT:lhs ASSIGN READ LPAR STRINGCONST:s RPAR SEMICOLON
| WRITE LPAR expr:e RPAR SEMICOLON
| WRITE LPAR STRINGCONST:s RPAR SEMICOLON
| IF LPAR cond:c RPAR stmt:s
| IF LPAR cond:c RPAR stmt:t ELSE stmt:e
| WHILE LPAR cond:c RPAR stmt:s
;
cond ::= BOOLCONST:c
| LPAR cond:c RPAR
| expr:e1 COMP:op expr:e2
| BUNOP cond:c
| cond:c1 BBINOP:op cond:c2
;
expr ::= IDENT:identifier
| INTCONST:constant
| LPAR expr:e RPAR
| BINOP expr:e
| expr:e1 BINOP:op expr:e2
;
```

### Generated scanner

Instead of teadiously creating a scanner manually by hand as we did in the last example, it is advisable to rely to an automatically generated one, to benefit from technical achievements in the sector of finite automata. For CUP parsers, JFlex is a reasonable scanner generator. The scanner specification for MiniJava looks like this:

```// Technische Universitaet Muenchen
// Fakultaet fuer Informatik
// Michael Petter

import java_cup.runtime.Symbol;
import java_cup.runtime.ComplexSymbolFactory;
import java_cup.runtime.ComplexSymbolFactory.Location;

%%
%public
%class Lexer
%cup
%implements sym, minijava.Constants
%char
%line
%column

%{
StringBuffer string = new StringBuffer();
this(in);
symbolFactory = sf;
}
ComplexSymbolFactory symbolFactory;

private Symbol symbol(String name, int sym) {
return symbolFactory.newSymbol(name, sym, new Location(yyline+1,yycolumn+1,yychar), new Location(yyline+1,yycolumn+yylength(),yychar+yylength()));
}

private Symbol symbol(String name, int sym, Object val) {
Location left = new Location(yyline+1,yycolumn+1,yychar);
Location right= new Location(yyline+1,yycolumn+yylength(), yychar+yylength());
return symbolFactory.newSymbol(name, sym, left, right,val);
}
private Symbol symbol(String name, int sym, Object val,int buflength) {
Location left = new Location(yyline+1,yycolumn+yylength()-buflength,yychar+yylength()-buflength);
Location right= new Location(yyline+1,yycolumn+yylength(), yychar+yylength());
return symbolFactory.newSymbol(name, sym, left, right,val);
}
private void error(String message) {
System.out.println("Error at line "+(yyline+1)+", column "+(yycolumn+1)+" : "+message);
}
%}

%eofval{
return symbolFactory.newSymbol("EOF", EOF, new Location(yyline+1,yycolumn+1,yychar), new Location(yyline+1,yycolumn+1,yychar+1));
%eofval}

Ident = [a-zA-Z\$_] [a-zA-Z0-9\$_]*
IntLiteral = 0 | [1-9][0-9]*
BoolLiteral = true | false
new_line = \r|\n|\r\n;
white_space = {new_line} | [ \t\f]

%state STRING

%%

<YYINITIAL>{
/* keywords */
"int"             { return symbol("int",TYPE, new Integer( INTTYPE ) ); }
"if"              { return symbol("if",IF); }
"else"            { return symbol("else",ELSE); }
"while"           { return symbol("while",WHILE); }
"write"           { return symbol("write",WRITE); }

/* names */
{Ident}           { return symbol("Identifier",IDENT, yytext()); }

/* bool literal */
{BoolLiteral} { return symbol("Boolconst",BOOLCONST, new Boolean(Boolean.parseBool(yytext()))); }

/* literals */
{IntLiteral} { return symbol("Intconst",INTCONST, new Integer(Integer.parseInt(yytext()))); }

/* separators */
\"              { string.setLength(0); yybegin(STRING); }
";"               { return symbol("semicolon",SEMICOLON); }
","               { return symbol("comma",COMMA); }
"("               { return symbol("(",LPAR); }
")"               { return symbol(")",RPAR); }
"{"               { return symbol("{",BEGIN); }
"}"               { return symbol("}",END); }
"="               { return symbol("=",ASSIGN); }
"+"               { return symbol("plus",BINOP, new Integer( PLUS ) ); }
"-"               { return symbol("minus",BINOP, new Integer( MINUS ) ); }
"*"               { return symbol("mult",BINOP, new Integer( MULT ) ); }
"/"               { return symbol("div",BINOP, new Integer( DIV ) ); }
"%"               { return symbol("mod",BINOP, new Integer( MOD ) ); }
"<="              { return symbol("leq",COMP,  new Integer( LEQ ) ); }
">="              { return symbol("gtq",COMP,  new Integer( GTQ ) ); }
"=="              { return symbol("eq",COMP,  new Integer( EQ  ) ); }
"!="              { return symbol("neq",COMP,  new Integer( NEQ ) ); }
"<"               { return symbol("le",COMP,  new Integer( LE  ) ); }
">"               { return symbol("gt",COMP,  new Integer( GT  ) ); }
"&&"              { return symbol("and",BBINOP,new Integer( AND ) ); }
"||"              { return symbol("or",BBINOP,new Integer( OR  ) ); }
"!"               { return symbol("not",BUNOP); }

{white_space}     { /* ignore */ }

}

<STRING> {
\"                             { yybegin(YYINITIAL);
return symbol("StringConst",STRINGCONST,string.toString(),string.length()); }
[^\n\r\"\\]+                   { string.append( yytext() ); }
\\t                            { string.append('\t'); }
\\n                            { string.append('\n'); }

\\r                            { string.append('\r'); }
\\\"                           { string.append('\"'); }
\\                             { string.append('\\'); }
}

/* error fallback */
.|\n              {  /* throw new Error("Illegal character <"+ yytext()+">");*/
error("Illegal character <"+ yytext()+">");
}
```

### From parse tree to abstract syntax tree

As the direct output from the parser generator is made up only from `nonterminal` and `terminal` tags, we transform it immediately into a more convenient form with nodes being called as their syntactic counterparts, as well as optimizing expresion nesting depth and flattening of one-branched tree branches into sequences of XML nodes. This is achieved by the XSL-Transformation, called in the last lines of the parser's main method. This applies a stylesheet like the following:

```<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
>
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<xsl:include href="tree-view.xsl"/>

<xsl:template match="blacklist"  mode="flatten" />

<xsl:template match="document"  mode="flatten">
<xsl:apply-templates  mode="flatten" select="parsetree"/>
</xsl:template>

<xsl:template match="nonterminal" mode="flatten">
<xsl:variable name="temp" select="@id" />
<xsl:choose>

<xsl:when test="count(*)=3 and contains(\$temp,'expr') and *[contains(@id,'expr')]">
<xsl:apply-templates  mode="flatten"/>
</xsl:when>

<!-- collapses degenerated trees like lists, conserving the blacklist subtrees-->
<xsl:when test="../@id = @id and count(/document/blacklist[1]/symbol[text() = \$temp])=0">
<xsl:apply-templates  mode="flatten"/>
</xsl:when>

<!-- collapses unary productions -->
<xsl:when test="count(*)=3 and count(/document/blacklist[1]/symbol[text() = \$temp])=0">
<xsl:apply-templates  mode="flatten"/>
</xsl:when>

<xsl:otherwise>
<xsl:element name="{@id}" >
<xsl:attribute name="variant"><xsl:value-of select="@variant" /></xsl:attribute>
<xsl:apply-templates mode="flatten"/>
</xsl:element>
</xsl:otherwise>

</xsl:choose>
</xsl:template>

<xsl:template match="terminal" mode="flatten">
<xsl:element name="{@id}">
<xsl:apply-templates  mode="flatten"/>
</xsl:element>
</xsl:template>

<xsl:template match="/">
<xsl:variable name="flatten">
<xsl:apply-templates  mode="flatten"/>
</xsl:variable>
<xsl:variable name="rendered">
<xsl:apply-templates mode="rendered" select="exsl:node-set(\$flatten)"/>
</xsl:variable>

<!--xsl:copy-of select="\$rendered" /-->
<xsl:copy-of select="\$flatten" />
</xsl:template>

</xsl:stylesheet>
```

### A graphical AST

from this AST in XML it is straightforward to generate a nice graphical AST with an XSLT stylesheet and the according CSS:

### Code generator

Next in a compiler chain is after a few optimizations the code generation phase. In our example, the code generation is then defined via XQuery, and applied to the compiler output with `baseX codegen.sq output.xml > ismple.minijvm`

```declare namespace codegen = "http://www2.in.tum.de";

declare function codegen:codeL(\$expr as node(),\$env as node()*)
as xs:string {
for \$var in \$env
where name(\$var)=\$expr
return \$var
};

declare function codegen:codeC(\$expr as node(),\$env as node()*)
as xs:string {
let \$variant := \$expr/@variant
return switch(\$variant)
case "0" return "CONST " || \$expr/c || "&#10;"
case "1" return codegen:codeC(\$expr/c,\$env)
case "2" return codegen:codeR(\$expr/expr[1],\$env) || codegen:codeR(\$expr/expr[2],\$env) || \$expr/op ||"&#10;"
case "3" return codegen:codeR(\$expr/e,\$env) || \$expr/op ||"&#10;"
case "4" return codegen:codeC(\$expr/expr[1],\$env) || codegen:codeC(\$expr/expr[2],\$env) || \$expr/op ||"&#10;"
default return "error"
};

declare function codegen:codeR(\$expr as node(),\$env as node()*)
as xs:string {
let \$variant := \$expr/@variant
return switch(\$variant)
case "0" return "LOAD " || codegen:codeL(\$expr/identifier,\$env) || "&#10;"
case "1" return "CONST " || \$expr/constant || "&#10;"
case "2" return codegen:codeR(\$expr/e,\$env)
case "3" return codegen:codeR(\$expr/e,\$env) || \$expr/op ||"&#10;"
case "4" return codegen:codeR(\$expr/expr[1],\$env) || codegen:codeR(\$expr/expr[2],\$env) || \$expr/op ||"&#10;"
default return "error"
};

declare function codegen:assignment(\$stmt as node(),\$env as node()*)
as xs:string {
let \$rhs := \$stmt/expr
let \$lhs := \$stmt/lhs
return codegen:codeR(\$rhs,\$env) || "STORE " || codegen:codeL(\$lhs,\$env) || "&#10;"
};

declare function codegen:read(\$stmt as node(),\$env as node()*)
as xs:string {
let \$rhs := \$stmt/expr
let \$lhs := \$stmt/lhs
return "READ &#10;STORE " || codegen:codeL(\$lhs,\$env) || "&#10;"
};

declare function codegen:write(\$stmt as node(),\$env as node()*)
as xs:string {
let \$expr := \$stmt/expr
return codegen:codeR(\$expr,\$env) || "WRITE &#10;"
};

declare function codegen:while(\$stmt as node(),\$env as node()*)
as xs:string {
let \$cond := \$stmt/cond
let \$stmt := \$stmt/stmt
let \$ctx := count(\$stmt/preceding::*)
return "A"|| \$ctx || ": "|| codegen:codeC(\$cond,\$env) || "FJUMP B"|| \$ctx || "&#10;" || codegen:code(\$stmt,\$env) || "JUMP A"|| \$ctx || "&#10;B"|| \$ctx || ": "
};

declare function codegen:if(\$stmt as node(),\$env as node()*)
as xs:string {
let \$cond := \$stmt/cond
let \$stmt := \$stmt/stmt
let \$ctx := count(\$stmt/preceding::*)
return  codegen:codeC(\$cond,\$env) || "FJUMP A"|| \$ctx || "&#10;" || codegen:code(\$stmt,\$env) || "A"|| \$ctx || ": "
};

declare function codegen:ifelse(\$stmt as node(),\$env as node()*)
as xs:string {
let \$cond := \$stmt/cond
let \$stm := \$stmt/stmt
let \$ctx := count(\$stmt/preceding::*)
return codegen:codeC(\$cond,\$env) || "FJUMP A"|| \$ctx || "&#10;" ||codegen:code(\$stm[1],\$env) || "JUMP B"|| \$ctx || "&#10;A"|| \$ctx || ": " ||codegen:code(\$stm[2],\$env)|| "B"|| \$ctx || ": "
};

declare function codegen:code(\$stmt as node(),\$env)
as xs:string
{
let \$variant := \$stmt/@variant
return switch(\$variant)
case "0" return "n/a empty&#10;"
case "1" return fn:fold-right(\$stmt/stmtlist/stmt,"",function (\$stm,\$acc) { if (fn:empty(\$stm)) then "" else \$acc || codegen:code(\$stm,\$env) })
case "2" return codegen:assignment(\$stmt,\$env)
case "5" return codegen:write(\$stmt,\$env)
case "6" return "n/a: write(string)&#10;"
case "7" return codegen:if(\$stmt,\$env)
case "8" return codegen:ifelse(\$stmt,\$env)
case "9" return codegen:while(\$stmt,\$env)
default return "error" || \$stmt
};

declare function codegen:env(\$decllist as node())
as node()*
{
for \$id at \$ident in \$decllist//identifier
return element { xs:QName(\$id) } {
xs:string(\$ident)
}
};

let \$program := doc('input.xml')//program
let \$stmts := \$program/stmtlist
let \$env := codegen:env(\$program/decllist)
let \$alloc := count(\$env)
return "ALLOC "|| \$alloc || "&#10;" || fold-right(\$stmts/stmt,"",function(\$stmt,\$acc) { if (empty(\$stmt)) then "" else codegen:code(\$stmt,\$env)|| \$acc  }) || "HALT&#10;"
```

## MiniJava with manual AST and Codegeneration and error recovery

In order to grasp the modification, that is needed to manually generate a syntax tree from CUP specification, we have another implementation of the MiniJava compiler from above. This time, the specification is enriched with lots of calls to factory methods, that generate the objects for the syntax tree. Note, that in this case, we also implemented an error recovering parser, with an error alternative at the statement level. This enables the parser to continue after syntax errors, while offering alternative continuations for a valid parsing run!

```import java.util.*;
import java.io.*;
import java_cup.runtime.*;
import java_cup.runtime.ComplexSymbolFactory.ComplexSymbol;
import minijava.*;

parser code {:
public boolean syntaxErrors;

Lexer lexer;

public Parser(Lexer lex, ComplexSymbolFactory sf) {
super(lex,sf);
lexer = lex;
}
:};

terminal         SEMICOLON, COMMA, LPAR, RPAR, BEGIN, END,IF, ELSE, WHILE, READ, WRITE, BUNOP, ASSIGN;
terminal Integer TYPE, BINOP, UNOP, COMP, BBINOP, INTCONST;
terminal String  IDENT,STRINGCONST;
terminal Boolean BOOLCONST;

non terminal Program        program;
non terminal List<Decl>     decllist;
non terminal Decl           decl;
non terminal List<Stmt>     stmtlist;
non terminal List<Expr.Identifier>   identlist;
non terminal Stmt           stmt;
non terminal Expr           expr;
non terminal Cond           cond;

precedence left ELSE, UNOP, BINOP, BUNOP, BBINOP;

program   ::=  decllist:d stmtlist:s                        {: RESULT = new Program(d,s); :}
;
decllist  ::=  decl:d decllist:dl                           {: dl.add(d); RESULT = dl; :}
| /* empty decllist */                                  {: RESULT = new LinkedList<Decl>(); :}
;
stmtlist  ::= stmtlist:sl stmt:s                            {: sl.add(s); RESULT = sl; :}
| /* empty stmtlist */                                  {: RESULT = new LinkedList<Stmt>(); :}
;
decl ::= TYPE IDENT:i identlist:il  SEMICOLON               {: il.add(0,Expr.ident(ixleft,i,ixright));
RESULT = new Decl(il); :}
;
identlist  ::= identlist:il COMMA IDENT:i                   {: il.add(Expr.ident(ixleft,i,ixright)); RESULT = il; :}
| /* empty identlist*/                                  {: RESULT = new LinkedList<Expr.Identifier>(); :}
;
stmt ::= SEMICOLON                                          {: RESULT = Stmt.empty(); :}
| BEGIN stmtlist:sl END                                 {: RESULT = Stmt.compound(sl); :}
| IDENT:i ASSIGN expr:e SEMICOLON                       {: RESULT = Stmt.assign(ixleft,i,e,ixright); :}
| IDENT:i ASSIGN READ LPAR RPAR SEMICOLON               {: RESULT = Stmt.read(ixleft,i,ixright); :}
| IDENT:i ASSIGN READ LPAR STRINGCONST:s RPAR SEMICOLON {: RESULT = Stmt.read(ixleft,i,s,ixright); :}
| WRITE LPAR expr:e RPAR SEMICOLON                      {: RESULT = Stmt.write(e); :}
| WRITE LPAR STRINGCONST:s RPAR SEMICOLON               {: RESULT = Stmt.write(s); :}
| IF LPAR cond:c RPAR stmt:s                            {: RESULT = Stmt.ifthen(c,s); :}
| IF LPAR cond:c RPAR stmt:t ELSE stmt:e                {: RESULT = Stmt.ifthenelse(c,t,e); :}
| WHILE LPAR cond:c RPAR stmt:s                         {: RESULT = Stmt.whileloop(c,s); :}
| error:e                                               {: parser.report_error("Syntax error, skip rest",e); :}
;
cond ::= BOOLCONST:c                                        {: RESULT = Cond.boolconst(c); :}
| LPAR cond:c RPAR                                      {: RESULT = Cond.priority(c); :}
| expr:e1 COMP:op expr:e2                               {: RESULT = Cond.binop(e1,op,e2); :}
| BUNOP cond:c                                          {: RESULT = Cond.unop(c); :}
| cond:c1 BBINOP:op cond:c2                             {: RESULT = Cond.binop(c1,op,c2); :}
;
/* Yes, this grammar does not adhere to precedence of multiplication over addition                             */
/* This is due to the fact that it is introduced with an ambiguous grammar in the lecture, and was never fixed */
expr ::= IDENT:i                                            {: RESULT = Expr.ident(ixleft,i,ixright); :}
| INTCONST:c                                            {: RESULT = Expr.intconst(c); :}
| LPAR expr:e RPAR                                      {: RESULT = Expr.priority(e); :}
| BINOP expr:e                                          {: RESULT = Expr.unop(e); :}
| expr:e1 BINOP:op expr:e2                              {: RESULT = Expr.binop(e1,op,e2); :}
;
```

In principle, you are free to create syntax tree classes to suit your needs. However, we have found certain techniques useful, so we show you the following syntax tree class for expressions as an example; we equip them with an `accept` method, that implements a `visitor`-interface for this expression. Note, that the accept implementation pays respect to the `preVisit()`'s return value, and only descends into this subtree, when the return value was `true`. On large syntax trees, this saves a lot of time compared to standard full tree traversal. By clever usage of this pattern, one can simulate attribute computation quite easily:

```package minijava;
import java_cup.runtime.ComplexSymbolFactory.Location;

public abstract class Expr implements Constants {
public Expr()                             {    }
public abstract void accept(MinijavaVisitor v);
public static Priority priority(Expr e)   { return new Priority(e);  }
public static class Priority extends Expr {
public Expr e;
public Priority(Expr e)               {  this.e=e; }
public String toString()              { return "("+e+")"; }
public void accept(MinijavaVisitor v) {
if (!v.preVisit(this)) return;
e.accept(v);
v.postVisit(this);
}
}
public static Binex binop(Expr e1, int op, Expr e2){ return new Binex(e1,op,e2);  }
public static class Binex extends Expr    {
public Expr e1, e2;
public int op;
public Binex(Expr e1, int op, Expr e2){
this.e1=e1;
this.e2=e2;
this.op=op;
}
public String toString(){
String operator=null;
if (op==Constants.PLUS)  operator = "+";
if (op==Constants.MINUS) operator = "-";
if (op==Constants.MULT)  operator = "*";
if (op==Constants.DIV)   operator = "/";
if (op==Constants.MOD)   operator = "\\%";
return e1 + ""+operator + e2;
}
public void accept(MinijavaVisitor v){
if (!v.preVisit(this)) return;
e1.accept(v);
e2.accept(v);
v.postVisit(this);
}
}
public static Unex unop(Expr e)             { return new Unex(e);   }
public static class Unex extends Expr {
public Expr e1;
public Unex(Expr e1)                    { this.e1=e1; 	 }
public String toString()                { return "-"+e1; }
public void accept(MinijavaVisitor v)   {
if (!v.preVisit(this)) return;
e1.accept(v);
v.postVisit(this);
}
}
public static IntConst intconst(int i)      { return new IntConst(i);    }
public static class IntConst extends Expr   {
public int i;
public IntConst(int i)                  { this.i=i;	}
public String toString()                { return i+"";	}
public void accept(MinijavaVisitor v)   { v.visit(this);	}
}
public static Identifier ident(Location l, String s, Location r){	return new Identifier(l,s,r);    }
public static class Identifier extends Expr {
public Location left, right;
public String i;
public Identifier(Location l, String i,Location r){
this.i=i;
this.left=l;
this.right=r;
}
public String toString()                { return i;	}
public void accept(MinijavaVisitor v)   { v.visit(this);	}
}
}
```

The visitors for this structure then are able to walk the syntax tree and compute attributes; the following visitor implements the code generation for the MiniJVM:

```import minijava.*;
import java.util.*;
import java.io.*;

public class MiniJVMGenerator extends MinijavaVisitor implements Constants{
PrintWriter writer;
public MiniJVMGenerator(String filename){
try{
writer = new PrintWriter(filename+".jvm");
}catch(IOException ioe){
ioe.printStackTrace();
}
}
public void flush(){
writer.flush();
writer.close();
}
int declcounter = 0;
int loopcounter = 0;
int ifcounter = 0;
Map<String,Integer> symtab = new HashMap<String,Integer>();
public void postVisit(Program p){
writer.println("HALT");
}
public boolean preVisit(Stmt.Assign i){
writer.print("// "+i);
return true;
}
public void postVisit(Stmt.Assign i){
writer.println("STORE "+symtab.get(i.lhs));
}
public boolean preVisit(Stmt.Write i){
writer.print("// "+i);
return true;
}
public void postVisit(Stmt.Write i){
writer.println("WRITE");
}
writer.print("// "+i);
writer.println("STORE "+symtab.get(i.lhs));
}
public boolean preVisit(Stmt.IfThenElse i){
int e = ifcounter++;
int end = ifcounter++;
writer.println("// "+i.cond);
i.cond.accept(this);
writer.println("FJUMP IF"+e);
i.then.accept(this);
if (i.els!=null)
writer.println("JUMP IF"+end);
writer.print("IF"+e+":");
if(i.els!=null) {
i.els.accept(this);
writer.println("IF"+end+":");
}
return false;
}
public boolean preVisit(Stmt.Loop i){
int loop = loopcounter++;
int loopexit = loopcounter++;
writer.println("//"+ i.cond+" ??" );
writer.print("L"+ loop +":" );
i.cond.accept(this);
writer.println("FJUMP L"+loopexit);
i.body.accept(this);
writer.println("JUMP L"+loop);
writer.print("L"+loopexit+":");
return false;
}
public boolean preVisit(Decl d){
writer.print("//"+d);
return true;
}
public void postVisit(Decl d){
for (Expr.Identifier s :d.varlist) symtab.put(s.i,declcounter++);
writer.println("ALLOC "+d.varlist.size());
}
public void postVisit(Cond.BUnOp i){
writer.println("NOT");
}
public void postVisit(Cond.BinCond i){
if (i.op==LEQ) writer.println("LEQ");
if (i.op==LE) writer.println("LESS");
if (i.op==GTQ) writer.println("GTQ");
if (i.op==GT) writer.println("GT");
if (i.op==EQ) writer.println("EQ");
if (i.op==NEQ) writer.println("NEQ");
}
public void postVisit(Cond.BBinCond i){
if (i.op==AND) writer.println("AND");
if (i.op==OR) writer.println("OR");
}
public void visit(Cond.BoolConst d){
writer.println("???");
}
public void visit(Expr.Identifier d){
}
public void visit(Expr.IntConst d){
writer.println("CONST "+d.i);
}
public void postVisit(Expr.Binex i){
if (i.op==MINUS) writer.println("SUB");
if (i.op==MULT) writer.println("MUL");
if (i.op==DIV) writer.println("DIV");
if (i.op==MOD) writer.println("MOD");
}
public void postVisit(Expr.Unex i){
writer.println("NEG");
}
}
```

## Parsing C

Parsing C is somewhat challenging, as its grammar is not context free. This is due to C's mechanism of defining type keywords during a program via typedefs. However, with a few lines of user defined action code, we can communicate the introduction of type keywords during the parsing process back to the scanner, while keeping track of scopes from inside the parser.

```import java.io.*;
import java.util.*;
import java_cup.runtime.*;
import java_cup.runtime.XMLElement.*;
import javax.xml.stream.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;

parser code {:
public void syntax_error(Symbol cur_token){
System.err.println("Syntax error at "+cur_token);
}
public static void newScope(){
typenames.push(new HashSet<String>());
}
public static void deleteScope(){
typenames.pop();
}
public static boolean lookupType(String name){
for (HashSet<String> scope: typenames)
if (scope.contains(name)) return true;
return false;
}
}
public Parser(Lexer lex, ComplexSymbolFactory sf) {
super(lex,sf);
}

public static void main(String args[]) {
try {
ComplexSymbolFactory csf = new ComplexSymbolFactory();
// create a buffering scanner wrapper
// start parsing
Parser p = new Parser(lexer,csf);
System.out.println("Parser runs: ");
newScope();
XMLElement e = (XMLElement)p.parse().value;
// create XML output file
XMLOutputFactory outFactory = XMLOutputFactory.newInstance();
XMLStreamWriter sw = outFactory.createXMLStreamWriter(new FileOutputStream(args[1]));
// dump XML output to the file
XMLElement.dump(lexer,sw,e); //,"expr","stmt");
// transform the parse tree into an AST and a rendered HTML version
Transformer transformer = TransformerFactory.newInstance()
.newTransformer(new StreamSource(new File("tree.xsl")));
Source text = new StreamSource(new File(args[1]));
transformer.transform(text, new StreamResult(new File("output.html")));

System.out.println("Parsing finished!");
} catch (Exception e) {
e.printStackTrace();
}
}
:};

terminal IDENTIFIER, CONSTANT, STRING_LITERAL, SIZEOF, PTR_OP, INC_OP, DEC_OP, LEFT_OP, RIGHT_OP, LE_OP, GE_OP,
AND_OP, OR_OP, MUL_ASSIGN, DIV_ASSIGN, MOD_ASSIGN, ADD_ASSIGN, SUB_ASSIGN, LEFT_ASSIGN, RIGHT_ASSIGN,
XOR_ASSIGN, OR_ASSIGN, TYPE_NAME, TYPEDEF, EXTERN, STATIC, AUTO, REGISTER, CHAR, SHORT, INT, LONG, SIGNED,
UNSIGNED, FLOAT, DOUBLE, CONST, VOLATILE, VOID, STRUCT, UNION, ENUM, ELLIPSIS, CASE, DEFAULT, IF, ELSE,
SWITCH, WHILE, DO, FOR, GOTO, CONTINUE, BREAK, RETURN, SEMI, CURLYL, CURLYR, COMMA, COLON, ASSIGN, PARAL,
PARAR, SQUAREDL, SQUAREDR, POINT, ADRESS, NOT, TILDE, AND_ASSIGN, EQ_OP, NE_OP, MINUS, PLUS, MUL, DIVIDE,
MODULUS, LESS, GREATER, XOR, OR, COND;

non terminal translation_unit, primary_expression, postfix_expression, expression, assignment_expression;
non terminal unary_operator, type_name, cast_expression, multiplicative_expression, additive_expression;
non terminal shift_expression, equality_expression,  and_expression,  exclusive_or_expression;
non terminal logical_and_expression,  logical_or_expression,  conditional_expression,  constant_expression;
non terminal declaration_specifiers, init_declarator_list, storage_class_specifier, type_specifier, type_qualifier;
non terminal init_declarator, declarator, struct_or_union_specifier, struct_declaration_list, struct_declaration;
non terminal initializer,  specifier_qualifier_list,  struct_declarator_list,  struct_declarator,  enum_specifier;
non terminal enumerator_list,  enumerator,  pointer,  direct_declarator,  parameter_type_list,  identifier_list;
non terminal type_qualifier_list,  parameter_declaration,  abstract_declarator,  direct_abstract_declarator;
non terminal initializer_list,  statement,  labeled_statement,  compound_statement,  selection_statement;
non terminal jump_statement, statement_list, expression_statement, external_declaration, function_definition;
non terminal assignment_operator, parameter_list, unary_expression, iteration_statement, declaration_list;
non terminal relational_expression,  inclusive_or_expression, declaration;

precedence nonassoc ELSE;

primary_expression ::= IDENTIFIER:ident
| CONSTANT:constant
| STRING_LITERAL:stringliteral
| PARAL expression:e PARAR
;

postfix_expression ::= primary_expression:pe
| postfix_expression:pe SQUAREDL expression:index SQUAREDR
| postfix_expression:pe PARAL PARAR
| postfix_expression:pe PARAL expression:e PARAR
| postfix_expression:pe POINT IDENTIFIER:id
| postfix_expression:pe PTR_OP IDENTIFIER:id
| postfix_expression:pe INC_OP:op
| postfix_expression:pe DEC_OP:op
;

unary_expression ::= postfix_expression:pe
| INC_OP:op unary_expression:ue
| DEC_OP:op unary_expression:ue
| unary_operator:uo cast_expression:ce
| SIZEOF unary_expression:ue
| SIZEOF PARAL type_name:tn PARAR
;

| MUL:op
| PLUS:op
| MINUS:op
| TILDE
| NOT:op
;

cast_expression
::=unary_expression:ue
| PARAL type_name:tn PARAR cast_expression:ce
;

multiplicative_expression ::= cast_expression:ce
| multiplicative_expression:me MUL:op cast_expression:ce
| multiplicative_expression:me DIVIDE:op cast_expression:ce
| multiplicative_expression:me MODULUS:op cast_expression:ce
;

;

;

relational_expression ::= shift_expression:se
| relational_expression:re LESS:op shift_expression:se
| relational_expression:re GREATER:op shift_expression:se
| relational_expression:re LE_OP:op shift_expression:se
| relational_expression:re GE_OP:op shift_expression:se
;

equality_expression ::= relational_expression:re
| equality_expression:ee EQ_OP:op relational_expression:re
| equality_expression:ee NE_OP:op relational_expression:re
;

and_expression ::= equality_expression:ee
;

exclusive_or_expression	::= and_expression:ae
| exclusive_or_expression:eoe XOR and_expression:ae
;

inclusive_or_expression	::= exclusive_or_expression:eoe
| inclusive_or_expression:ioe OR exclusive_or_expression:eoe
;

logical_and_expression ::= inclusive_or_expression:ioe
| logical_and_expression:lae AND_OP:op inclusive_or_expression:ioe
;

logical_or_expression ::= logical_and_expression:lae
| logical_or_expression:loe OR_OP:op logical_and_expression:lae
;

conditional_expression ::= logical_or_expression:loe
| logical_or_expression:loe COND expression:e COLON conditional_expression:ce
;

assignment_expression ::= conditional_expression:ce
| unary_expression:ue assignment_operator:aop assignment_expression:ae
;

assignment_operator ::=ASSIGN
| MUL_ASSIGN
| DIV_ASSIGN
| MOD_ASSIGN
| SUB_ASSIGN
| LEFT_ASSIGN
| RIGHT_ASSIGN
| AND_ASSIGN
| XOR_ASSIGN
| OR_ASSIGN
;

expression ::= assignment_expression:ae
| expression:e COMMA assignment_expression:ae
;

constant_expression ::=conditional_expression:ce
;

declaration ::=declaration_specifiers:ds SEMI
| declaration_specifiers:ds init_declarator_list:idl {:
if (ds.toString().indexOf(">typedef<")>0) {
for (XMLElement e: ((XMLElement)idl).selectById("identifier"))
}
:} SEMI
;

declaration_specifiers ::=storage_class_specifier:scc
| storage_class_specifier:scc declaration_specifiers:ds
| type_specifier:ts
| type_specifier:ts declaration_specifiers:ds
| type_qualifier:tq
| type_qualifier:tq declaration_specifiers:ds
;

init_declarator_list ::=init_declarator:id
| init_declarator_list:idl COMMA init_declarator:id
;

init_declarator ::=declarator:d
;

storage_class_specifier ::=  TYPEDEF:id
| EXTERN:id
| STATIC:id
| AUTO:id
| REGISTER:id
;

type_specifier ::=VOID:type
| CHAR:type
| SHORT:type
| INT:type
| LONG:type
| FLOAT:type
| DOUBLE:type
| SIGNED:type
| UNSIGNED:type
| struct_or_union_specifier:su
| enum_specifier:es
| TYPE_NAME:type
;

struct_or_union_specifier ::= STRUCT:s IDENTIFIER:id CURLYL struct_declaration_list:sdl CURLYR
|  STRUCT:s CURLYL struct_declaration_list:sdl CURLYR
|  STRUCT:s IDENTIFIER:id
|  UNION:u IDENTIFIER:id CURLYL struct_declaration_list:sdl CURLYR
|  UNION:u CURLYL struct_declaration_list:sdl CURLYR
|  UNION:u IDENTIFIER:id
;

struct_declaration_list	::=struct_declaration:s
| struct_declaration_list:sl struct_declaration:s
;

struct_declaration ::=specifier_qualifier_list:sq struct_declarator_list:sd SEMI
;

specifier_qualifier_list ::=type_specifier:ts specifier_qualifier_list:sq
| type_specifier:ts
| type_qualifier:tq specifier_qualifier_list:sq
| type_qualifier:tq
;

struct_declarator_list ::=struct_declarator:s
| struct_declarator_list:sl COMMA struct_declarator:s
;

struct_declarator ::=declarator:d
| COLON constant_expression:ce
| declarator:d COLON constant_expression:ce
;

enum_specifier ::=ENUM CURLYL enumerator_list:el CURLYR
| ENUM IDENTIFIER:id CURLYL enumerator_list:el CURLYR
| ENUM IDENTIFIER:id
;

enumerator_list	::=enumerator:e
| enumerator_list:el COMMA enumerator:e
;

enumerator ::=IDENTIFIER:id
| IDENTIFIER:id ASSIGN constant_expression:ce
;

type_qualifier ::=CONST:id
| VOLATILE:id
;

declarator ::=pointer:p direct_declarator:direct
| direct_declarator:direct
;

direct_declarator ::=IDENTIFIER:identifier
| PARAL declarator:d PARAR
| direct_declarator:dd SQUAREDL constant_expression:ce SQUAREDR
| direct_declarator:dd SQUAREDL SQUAREDR
| direct_declarator:dd PARAL parameter_type_list:ptl PARAR
| direct_declarator:dd PARAL identifier_list:il PARAR
| direct_declarator:dd PARAL PARAR
;

pointer	::=MUL:id
| MUL:id type_qualifier_list:tql
| MUL:id pointer:p
| MUL:id type_qualifier_list:tql pointer:p
;

type_qualifier_list ::=type_qualifier:tq
| type_qualifier_list:tql type_qualifier:tq
;

parameter_type_list ::=parameter_list:pl
| parameter_list:pl COMMA ELLIPSIS:id
;

parameter_list ::=parameter_declaration:pd
| parameter_list:pl COMMA parameter_declaration:pd
;

parameter_declaration ::=declaration_specifiers:ds declarator:d
| declaration_specifiers:ds
;

identifier_list ::=IDENTIFIER:id
| identifier_list:idl COMMA IDENTIFIER:id
;

type_name ::=specifier_qualifier_list:sl
;

abstract_declarator ::=pointer:p
| pointer:p direct_abstract_declarator:d
;

| SQUAREDL:id SQUAREDR
| SQUAREDL:id constant_expression:ce SQUAREDR
| PARAL:id PARAR
| PARAL:id parameter_type_list:ptl PARAR
;

initializer ::=assignment_expression:ae
| CURLYL initializer_list:il CURLYR
| CURLYL initializer_list:il COMMA CURLYR
;

initializer_list ::=initializer:i
| initializer_list:il COMMA initializer:i
;

statement ::=labeled_statement:ls
| {: Parser.newScope(); :}  compound_statement:cs {: Parser.deleteScope(); :}
| expression_statement:es
| selection_statement:ss
| iteration_statement:is
| jump_statement:js
;

labeled_statement ::=IDENTIFIER:id COLON statement:s
| CASE constant_expression:ce COLON statement:s
| DEFAULT COLON statement:s
;

compound_statement ::=CURLYL CURLYR
| CURLYL statement_list:sl CURLYR
| CURLYL declaration_list:dl CURLYR
| CURLYL declaration_list:dl statement_list:sl CURLYR
;

declaration_list ::=declaration:d
| declaration_list:dl declaration:d
;

statement_list ::=statement:s
| statement_list:sl statement:s
;

expression_statement ::=SEMI
| expression:e SEMI
;

selection_statement ::=IF PARAL expression:e PARAR statement:s
| IF PARAL expression:e PARAR statement:s1 ELSE statement:s2
| SWITCH PARAL expression:e PARAR statement:s
;

iteration_statement ::=WHILE PARAL expression:e PARAR statement:s
| DO statement:s WHILE PARAL expression:e PARAR SEMI
| FOR PARAL expression_statement:es1 expression_statement:es2 PARAR statement:s
| FOR PARAL expression_statement:es1 expression_statement:es2 expression:e PARAR statement:s
;

jump_statement ::=GOTO IDENTIFIER:id SEMI
| CONTINUE SEMI
| BREAK SEMI
| RETURN SEMI
| RETURN expression:e SEMI
;

translation_unit ::=external_declaration:ed
| translation_unit:tu external_declaration:ed
;

external_declaration ::=function_definition:fd
| declaration:d
;

function_definition ::=declaration_specifiers:ds declarator:d declaration_list:dl
{: Parser.newScope(); :}
compound_statement:cs 		                   {: Parser.deleteScope(); :}
| declaration_specifiers:ds declarator:d 	  {: Parser.newScope(); :}
compound_statement:cs 	                   {: Parser.deleteScope(); :}
| declarator:d declaration_list:dl 	        {: Parser.newScope(); :}
compound_statement:cs 	                   {: Parser.deleteScope(); :}
| declarator:d 	                            {: Parser.newScope(); :}
compound_statement:cs 	                   {: Parser.deleteScope(); :}
;
```