MiniJava - A compiler with CUP via XML-techniques
MiniJava - this time with manual user actions
This example is also found in the repository.
This example should give You a basic idea of how to start working with CUP. A minimal CUP specification looks like this:
/* Simple +/-/* expression language; parser evaluates constant expressions on the fly*/ import java_cup.runtime.*; parser code {: // Connect this parser to a scanner! scanner s; Parser(scanner s){ this.s=s; } :} /* define how to connect to the scanner! */ init with {: s.init(); :}; scan with {: return s.next_token(); :}; /* Terminals (tokens returned by the scanner). */ terminal SEMI, PLUS, MINUS, TIMES, UMINUS, LPAREN, RPAREN; terminal Integer NUMBER; // our scanner provides numbers as integers /* Non terminals */ non terminal expr_list; non terminal Integer expr; // used to store evaluated subexpressions /* Precedences */ precedence left PLUS, MINUS; precedence left TIMES; precedence left UMINUS; /* The grammar rules */ expr_list ::= expr_list expr:e SEMI {: System.out.println(e);:} | expr:e SEMI {: System.out.println(e);:} ; expr ::= expr:e1 PLUS expr:e2 {: RESULT = e1+e2; :} | expr:e1 MINUS expr:e2 {: RESULT = e1-e2; :} | expr:e1 TIMES expr:e2 {: RESULT = e1*e2; :} | MINUS expr:e {: RESULT = -e; :} %prec UMINUS | LPAREN expr:e RPAREN {: RESULT = e; :} | NUMBER:n {: RESULT = n; :} ;
Based on this file, You just need to create a Java Specification from this with java -jar java-cup-11b.jar -interface -parser Parser calc.cup
. Additionally, You need a basic scanner to produce tokens, which You then can process with your parser. For once, we show, how to handcode such a scanner, reading symbols from the imput:
import java_cup.runtime.*; class Main { public static void main(String[] argv) throws Exception{ System.out.println("Please type your arithmethic expression:"); Parser p = new Parser(new scanner()); p.parse(); } } public class scanner { /* single lookahead character */ protected static int next_char; /* we use a SymbolFactory to generate Symbols */ private SymbolFactory sf = new DefaultSymbolFactory(); /* advance input by one character */ protected static void advance() throws java.io.IOException { next_char = System.in.read(); } /* initialize the scanner */ public static void init() throws java.io.IOException { advance(); } /* recognize and return the next complete token */ public Symbol next_token() throws java.io.IOException { for (;;) switch (next_char) { case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': /* parse a decimal integer */ int i_val = 0; do { i_val = i_val * 10 + (next_char - '0'); advance(); } while (next_char >= '0' && next_char <= '9'); return sf.newSymbol("NUMBER",sym.NUMBER, new Integer(i_val)); case ';': advance(); return sf.newSymbol("SEMI",sym.SEMI); case '+': advance(); return sf.newSymbol("PLUS",sym.PLUS); case '-': advance(); return sf.newSymbol("MINUS",sym.MINUS); case '*': advance(); return sf.newSymbol("TIMES",sym.TIMES); case '(': advance(); return sf.newSymbol("LPAREN",sym.LPAREN); case ')': advance(); return sf.newSymbol("RPAREN",sym.RPAREN); case -1: return sf.newSymbol("EOF",sym.EOF); default: /* in this simple scanner we just ignore everything else */ advance(); break; } } };
Compiling all classes with javac -cp java-cup-11b-runtime.jar:. *.java
creates your calculator, that you can
call via java -cp ../../dist/java-cup-11b-runtime.jar:. Main
MiniJava is a small artificial toy language, only here to illustrate the basic steps in creating a compiler with CUP. This language has a specification via a grammar ("MiniJava"). It essentially knows branches, loops, declarations and basic i/o. The target is a toy assembler ("MiniJVM"), for which we have a simulator. Typical input in this language looks as follows:
int x, y; y = 10; x = read(); write(x); if (x>=0) write (y); else write (x); while (1-y <= -x) { y = x * (-y + 5); }
A typical usage for syntax trees is generation of target code. This can be either done manually with creating syntax tree classes in action codes for each production alternative and devising a mechansim for traversing this tree, e.g. with recursive function calls or visitor patterns.
CUP however offers the possibility to create an XML document for your syntaxtree automatically, which can then be processed with an XSLT/XQuery processor like baseX or saxon, even for generating target code. You can find an example for such an approach in this section.
We start with a direct conversion of the grammar into a CUP-parser specification. We generate the Java
based parser via java -jar java-cup-11b.jar -locations -interface -parser Parser -xmlactions minijava.cup
/* Minijava Grammar */ import java_cup.runtime.ComplexSymbolFactory; import java_cup.runtime.ScannerBuffer; import java_cup.runtime.XMLElement; import javax.xml.stream.XMLOutputFactory; import javax.xml.stream.XMLStreamWriter; import java.io.*; import javax.xml.transform.*; import javax.xml.transform.stream.*; parser code {: public Parser(Lexer lex, ComplexSymbolFactory sf) { super(lex,sf); } public static void main(String[] args) throws Exception { // initialize the symbol factory ComplexSymbolFactory csf = new ComplexSymbolFactory(); // create a buffering scanner wrapper ScannerBuffer lexer = new ScannerBuffer(new Lexer(new BufferedReader(new FileReader(args[0])),csf)); // start parsing Parser p = new Parser(lexer,csf); XMLElement e = (XMLElement)p.parse().value; // create XML output file XMLOutputFactory outFactory = XMLOutputFactory.newInstance(); XMLStreamWriter sw = outFactory.createXMLStreamWriter(new FileOutputStream(args[1])); // dump XML output to the file XMLElement.dump(lexer,sw,e,"expr","stmt"); // transform the parse tree into an AST and a rendered HTML version Transformer transformer = TransformerFactory.newInstance() .newTransformer(new StreamSource(new File("tree.xsl"))); Source text = new StreamSource(new File(args[1])); transformer.transform(text, new StreamResult(new File("output.xml"))); transformer = TransformerFactory.newInstance() .newTransformer(new StreamSource(new File("tree-view.xsl"))); text = new StreamSource(new File("output.xml")); transformer.transform(text, new StreamResult(new File("ast.html"))); } :}; terminal SEMICOLON, COMMA, LPAR, RPAR, BEGIN, END, IF, ELSE, WHILE, READ, WRITE, BUNOP, ASSIGN; terminal TYPE, BINOP, UNOP, COMP, BBINOP, INTCONST; terminal IDENT,STRINGCONST; terminal BOOLCONST; non terminal program, decllist,decl,stmtlist,identlist,stmt,expr,cond; precedence left ELSE, UNOP, BINOP, BUNOP, BBINOP; program ::= decllist:d stmtlist:s ; decllist ::= decl:d decllist:dl | /* empty */ ; stmtlist ::= stmtlist:sl stmt:s | /* empty */ ; decl ::= TYPE IDENT:identifier identlist:il SEMICOLON ; identlist ::= identlist:il COMMA IDENT:identifier | /* empty */ ; stmt ::= SEMICOLON | BEGIN stmtlist:sl END | IDENT:lhs ASSIGN expr:rhs SEMICOLON | IDENT:lhs ASSIGN READ LPAR RPAR SEMICOLON | IDENT:lhs ASSIGN READ LPAR STRINGCONST:s RPAR SEMICOLON | WRITE LPAR expr:e RPAR SEMICOLON | WRITE LPAR STRINGCONST:s RPAR SEMICOLON | IF LPAR cond:c RPAR stmt:s | IF LPAR cond:c RPAR stmt:t ELSE stmt:e | WHILE LPAR cond:c RPAR stmt:s ; cond ::= BOOLCONST:c | LPAR cond:c RPAR | expr:e1 COMP:op expr:e2 | BUNOP cond:c | cond:c1 BBINOP:op cond:c2 ; expr ::= IDENT:identifier | INTCONST:constant | LPAR expr:e RPAR | BINOP expr:e | expr:e1 BINOP:op expr:e2 ;
Instead of teadiously creating a scanner manually by hand as we did in the last example, it is advisable to rely to an automatically generated one, to benefit from technical achievements in the sector of finite automata. For CUP parsers, JFlex is a reasonable scanner generator. The scanner specification for MiniJava looks like this:
// Technische Universitaet Muenchen // Fakultaet fuer Informatik // Michael Petter import java_cup.runtime.Symbol; import java_cup.runtime.ComplexSymbolFactory; import java_cup.runtime.ComplexSymbolFactory.Location; %% %public %class Lexer %cup %implements sym, minijava.Constants %char %line %column %{ StringBuffer string = new StringBuffer(); public Lexer(java.io.Reader in, ComplexSymbolFactory sf){ this(in); symbolFactory = sf; } ComplexSymbolFactory symbolFactory; private Symbol symbol(String name, int sym) { return symbolFactory.newSymbol(name, sym, new Location(yyline+1,yycolumn+1,yychar), new Location(yyline+1,yycolumn+yylength(),yychar+yylength())); } private Symbol symbol(String name, int sym, Object val) { Location left = new Location(yyline+1,yycolumn+1,yychar); Location right= new Location(yyline+1,yycolumn+yylength(), yychar+yylength()); return symbolFactory.newSymbol(name, sym, left, right,val); } private Symbol symbol(String name, int sym, Object val,int buflength) { Location left = new Location(yyline+1,yycolumn+yylength()-buflength,yychar+yylength()-buflength); Location right= new Location(yyline+1,yycolumn+yylength(), yychar+yylength()); return symbolFactory.newSymbol(name, sym, left, right,val); } private void error(String message) { System.out.println("Error at line "+(yyline+1)+", column "+(yycolumn+1)+" : "+message); } %} %eofval{ return symbolFactory.newSymbol("EOF", EOF, new Location(yyline+1,yycolumn+1,yychar), new Location(yyline+1,yycolumn+1,yychar+1)); %eofval} Ident = [a-zA-Z$_] [a-zA-Z0-9$_]* IntLiteral = 0 | [1-9][0-9]* BoolLiteral = true | false new_line = \r|\n|\r\n; white_space = {new_line} | [ \t\f] %state STRING %% <YYINITIAL>{ /* keywords */ "int" { return symbol("int",TYPE, new Integer( INTTYPE ) ); } "if" { return symbol("if",IF); } "else" { return symbol("else",ELSE); } "while" { return symbol("while",WHILE); } "read" { return symbol("read",READ); } "write" { return symbol("write",WRITE); } /* names */ {Ident} { return symbol("Identifier",IDENT, yytext()); } /* bool literal */ {BoolLiteral} { return symbol("Boolconst",BOOLCONST, new Boolean(Boolean.parseBool(yytext()))); } /* literals */ {IntLiteral} { return symbol("Intconst",INTCONST, new Integer(Integer.parseInt(yytext()))); } /* separators */ \" { string.setLength(0); yybegin(STRING); } ";" { return symbol("semicolon",SEMICOLON); } "," { return symbol("comma",COMMA); } "(" { return symbol("(",LPAR); } ")" { return symbol(")",RPAR); } "{" { return symbol("{",BEGIN); } "}" { return symbol("}",END); } "=" { return symbol("=",ASSIGN); } "+" { return symbol("plus",BINOP, new Integer( PLUS ) ); } "-" { return symbol("minus",BINOP, new Integer( MINUS ) ); } "*" { return symbol("mult",BINOP, new Integer( MULT ) ); } "/" { return symbol("div",BINOP, new Integer( DIV ) ); } "%" { return symbol("mod",BINOP, new Integer( MOD ) ); } "<=" { return symbol("leq",COMP, new Integer( LEQ ) ); } ">=" { return symbol("gtq",COMP, new Integer( GTQ ) ); } "==" { return symbol("eq",COMP, new Integer( EQ ) ); } "!=" { return symbol("neq",COMP, new Integer( NEQ ) ); } "<" { return symbol("le",COMP, new Integer( LE ) ); } ">" { return symbol("gt",COMP, new Integer( GT ) ); } "&&" { return symbol("and",BBINOP,new Integer( AND ) ); } "||" { return symbol("or",BBINOP,new Integer( OR ) ); } "!" { return symbol("not",BUNOP); } {white_space} { /* ignore */ } } <STRING> { \" { yybegin(YYINITIAL); return symbol("StringConst",STRINGCONST,string.toString(),string.length()); } [^\n\r\"\\]+ { string.append( yytext() ); } \\t { string.append('\t'); } \\n { string.append('\n'); } \\r { string.append('\r'); } \\\" { string.append('\"'); } \\ { string.append('\\'); } } /* error fallback */ .|\n { /* throw new Error("Illegal character <"+ yytext()+">");*/ error("Illegal character <"+ yytext()+">"); }
As the direct output from the parser generator is made up only from nonterminal
and terminal
tags,
we transform it immediately into a more convenient form with nodes being called as their syntactic counterparts, as well as
optimizing expresion nesting depth and flattening of one-branched tree branches into sequences of XML nodes. This is achieved
by the XSL-Transformation, called in the last lines of the parser's main method. This applies a stylesheet like the following:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exsl="http://exslt.org/common" > <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:include href="tree-view.xsl"/> <xsl:template match="blacklist" mode="flatten" /> <xsl:template match="document" mode="flatten"> <xsl:apply-templates mode="flatten" select="parsetree"/> </xsl:template> <xsl:template match="nonterminal" mode="flatten"> <xsl:variable name="temp" select="@id" /> <xsl:choose> <xsl:when test="count(*)=3 and contains($temp,'expr') and *[contains(@id,'expr')]"> <xsl:apply-templates mode="flatten"/> </xsl:when> <!-- collapses degenerated trees like lists, conserving the blacklist subtrees--> <xsl:when test="../@id = @id and count(/document/blacklist[1]/symbol[text() = $temp])=0"> <xsl:apply-templates mode="flatten"/> </xsl:when> <!-- collapses unary productions --> <xsl:when test="count(*)=3 and count(/document/blacklist[1]/symbol[text() = $temp])=0"> <xsl:apply-templates mode="flatten"/> </xsl:when> <xsl:otherwise> <xsl:element name="{@id}" > <xsl:attribute name="variant"><xsl:value-of select="@variant" /></xsl:attribute> <xsl:apply-templates mode="flatten"/> </xsl:element> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="terminal" mode="flatten"> <xsl:element name="{@id}"> <xsl:apply-templates mode="flatten"/> </xsl:element> </xsl:template> <xsl:template match="/"> <xsl:variable name="flatten"> <xsl:apply-templates mode="flatten"/> </xsl:variable> <xsl:variable name="rendered"> <xsl:apply-templates mode="rendered" select="exsl:node-set($flatten)"/> </xsl:variable> <!--xsl:copy-of select="$rendered" /--> <xsl:copy-of select="$flatten" /> </xsl:template> </xsl:stylesheet>
from this AST in XML it is straightforward to generate a nice graphical AST with an XSLT stylesheet and the according CSS:
Next in a compiler chain is after a few optimizations the code generation phase. In our example, the
code generation is then defined via XQuery, and applied to the compiler output with
baseX codegen.sq output.xml > ismple.minijvm
declare namespace codegen = "http://www2.in.tum.de"; declare function codegen:codeL($expr as node(),$env as node()*) as xs:string { for $var in $env where name($var)=$expr return $var }; declare function codegen:codeC($expr as node(),$env as node()*) as xs:string { let $variant := $expr/@variant return switch($variant) case "0" return "CONST " || $expr/c || " " case "1" return codegen:codeC($expr/c,$env) case "2" return codegen:codeR($expr/expr[1],$env) || codegen:codeR($expr/expr[2],$env) || $expr/op ||" " case "3" return codegen:codeR($expr/e,$env) || $expr/op ||" " case "4" return codegen:codeC($expr/expr[1],$env) || codegen:codeC($expr/expr[2],$env) || $expr/op ||" " default return "error" }; declare function codegen:codeR($expr as node(),$env as node()*) as xs:string { let $variant := $expr/@variant return switch($variant) case "0" return "LOAD " || codegen:codeL($expr/identifier,$env) || " " case "1" return "CONST " || $expr/constant || " " case "2" return codegen:codeR($expr/e,$env) case "3" return codegen:codeR($expr/e,$env) || $expr/op ||" " case "4" return codegen:codeR($expr/expr[1],$env) || codegen:codeR($expr/expr[2],$env) || $expr/op ||" " default return "error" }; declare function codegen:assignment($stmt as node(),$env as node()*) as xs:string { let $rhs := $stmt/expr let $lhs := $stmt/lhs return codegen:codeR($rhs,$env) || "STORE " || codegen:codeL($lhs,$env) || " " }; declare function codegen:read($stmt as node(),$env as node()*) as xs:string { let $rhs := $stmt/expr let $lhs := $stmt/lhs return "READ STORE " || codegen:codeL($lhs,$env) || " " }; declare function codegen:write($stmt as node(),$env as node()*) as xs:string { let $expr := $stmt/expr return codegen:codeR($expr,$env) || "WRITE " }; declare function codegen:while($stmt as node(),$env as node()*) as xs:string { let $cond := $stmt/cond let $stmt := $stmt/stmt let $ctx := count($stmt/preceding::*) return "A"|| $ctx || ": "|| codegen:codeC($cond,$env) || "FJUMP B"|| $ctx || " " || codegen:code($stmt,$env) || "JUMP A"|| $ctx || " B"|| $ctx || ": " }; declare function codegen:if($stmt as node(),$env as node()*) as xs:string { let $cond := $stmt/cond let $stmt := $stmt/stmt let $ctx := count($stmt/preceding::*) return codegen:codeC($cond,$env) || "FJUMP A"|| $ctx || " " || codegen:code($stmt,$env) || "A"|| $ctx || ": " }; declare function codegen:ifelse($stmt as node(),$env as node()*) as xs:string { let $cond := $stmt/cond let $stm := $stmt/stmt let $ctx := count($stmt/preceding::*) return codegen:codeC($cond,$env) || "FJUMP A"|| $ctx || " " ||codegen:code($stm[1],$env) || "JUMP B"|| $ctx || " A"|| $ctx || ": " ||codegen:code($stm[2],$env)|| "B"|| $ctx || ": " }; declare function codegen:code($stmt as node(),$env) as xs:string { let $variant := $stmt/@variant return switch($variant) case "0" return "n/a empty " case "1" return fn:fold-right($stmt/stmtlist/stmt,"",function ($stm,$acc) { if (fn:empty($stm)) then "" else $acc || codegen:code($stm,$env) }) case "2" return codegen:assignment($stmt,$env) case "3" return codegen:read($stmt,$env) case "4" return codegen:read($stmt,$env) case "5" return codegen:write($stmt,$env) case "6" return "n/a: write(string) " case "7" return codegen:if($stmt,$env) case "8" return codegen:ifelse($stmt,$env) case "9" return codegen:while($stmt,$env) default return "error" || $stmt }; declare function codegen:env($decllist as node()) as node()* { for $id at $ident in $decllist//identifier return element { xs:QName($id) } { xs:string($ident) } }; let $program := doc('input.xml')//program let $stmts := $program/stmtlist let $env := codegen:env($program/decllist) let $alloc := count($env) return "ALLOC "|| $alloc || " " || fold-right($stmts/stmt,"",function($stmt,$acc) { if (empty($stmt)) then "" else codegen:code($stmt,$env)|| $acc }) || "HALT "
In order to grasp the modification, that is needed to manually generate a syntax tree from CUP specification, we have another implementation of the MiniJava compiler from above. This time, the specification is enriched with lots of calls to factory methods, that generate the objects for the syntax tree. Note, that in this case, we also implemented an error recovering parser, with an error alternative at the statement level. This enables the parser to continue after syntax errors, while offering alternative continuations for a valid parsing run!
import java.util.*; import java.io.*; import java_cup.runtime.*; import java_cup.runtime.ComplexSymbolFactory.ComplexSymbol; import minijava.*; parser code {: public boolean syntaxErrors; Lexer lexer; public Parser(Lexer lex, ComplexSymbolFactory sf) { super(lex,sf); lexer = lex; } :}; terminal SEMICOLON, COMMA, LPAR, RPAR, BEGIN, END,IF, ELSE, WHILE, READ, WRITE, BUNOP, ASSIGN; terminal Integer TYPE, BINOP, UNOP, COMP, BBINOP, INTCONST; terminal String IDENT,STRINGCONST; terminal Boolean BOOLCONST; non terminal Program program; non terminal List<Decl> decllist; non terminal Decl decl; non terminal List<Stmt> stmtlist; non terminal List<Expr.Identifier> identlist; non terminal Stmt stmt; non terminal Expr expr; non terminal Cond cond; precedence left ELSE, UNOP, BINOP, BUNOP, BBINOP; program ::= decllist:d stmtlist:s {: RESULT = new Program(d,s); :} ; decllist ::= decl:d decllist:dl {: dl.add(d); RESULT = dl; :} | /* empty decllist */ {: RESULT = new LinkedList<Decl>(); :} ; stmtlist ::= stmtlist:sl stmt:s {: sl.add(s); RESULT = sl; :} | /* empty stmtlist */ {: RESULT = new LinkedList<Stmt>(); :} ; decl ::= TYPE IDENT:i identlist:il SEMICOLON {: il.add(0,Expr.ident(ixleft,i,ixright)); RESULT = new Decl(il); :} ; identlist ::= identlist:il COMMA IDENT:i {: il.add(Expr.ident(ixleft,i,ixright)); RESULT = il; :} | /* empty identlist*/ {: RESULT = new LinkedList<Expr.Identifier>(); :} ; stmt ::= SEMICOLON {: RESULT = Stmt.empty(); :} | BEGIN stmtlist:sl END {: RESULT = Stmt.compound(sl); :} | IDENT:i ASSIGN expr:e SEMICOLON {: RESULT = Stmt.assign(ixleft,i,e,ixright); :} | IDENT:i ASSIGN READ LPAR RPAR SEMICOLON {: RESULT = Stmt.read(ixleft,i,ixright); :} | IDENT:i ASSIGN READ LPAR STRINGCONST:s RPAR SEMICOLON {: RESULT = Stmt.read(ixleft,i,s,ixright); :} | WRITE LPAR expr:e RPAR SEMICOLON {: RESULT = Stmt.write(e); :} | WRITE LPAR STRINGCONST:s RPAR SEMICOLON {: RESULT = Stmt.write(s); :} | IF LPAR cond:c RPAR stmt:s {: RESULT = Stmt.ifthen(c,s); :} | IF LPAR cond:c RPAR stmt:t ELSE stmt:e {: RESULT = Stmt.ifthenelse(c,t,e); :} | WHILE LPAR cond:c RPAR stmt:s {: RESULT = Stmt.whileloop(c,s); :} | error:e {: parser.report_error("Syntax error, skip rest",e); :} ; cond ::= BOOLCONST:c {: RESULT = Cond.boolconst(c); :} | LPAR cond:c RPAR {: RESULT = Cond.priority(c); :} | expr:e1 COMP:op expr:e2 {: RESULT = Cond.binop(e1,op,e2); :} | BUNOP cond:c {: RESULT = Cond.unop(c); :} | cond:c1 BBINOP:op cond:c2 {: RESULT = Cond.binop(c1,op,c2); :} ; /* Yes, this grammar does not adhere to precedence of multiplication over addition */ /* This is due to the fact that it is introduced with an ambiguous grammar in the lecture, and was never fixed */ expr ::= IDENT:i {: RESULT = Expr.ident(ixleft,i,ixright); :} | INTCONST:c {: RESULT = Expr.intconst(c); :} | LPAR expr:e RPAR {: RESULT = Expr.priority(e); :} | BINOP expr:e {: RESULT = Expr.unop(e); :} | expr:e1 BINOP:op expr:e2 {: RESULT = Expr.binop(e1,op,e2); :} ;
In principle, you are free to create syntax tree classes to suit your needs. However, we have found certain techniques
useful, so we show you the following syntax tree class for expressions as an example; we equip them with an accept
method, that implements a visitor
-interface for this expression. Note, that the accept implementation pays
respect to the preVisit()
's return value, and only descends into this subtree, when the return value was
true
. On large syntax trees, this saves a lot of time compared to standard full tree traversal. By clever
usage of this pattern, one can simulate attribute computation quite easily:
package minijava; import java_cup.runtime.ComplexSymbolFactory.Location; public abstract class Expr implements Constants { public Expr() { } public abstract void accept(MinijavaVisitor v); public static Priority priority(Expr e) { return new Priority(e); } public static class Priority extends Expr { public Expr e; public Priority(Expr e) { this.e=e; } public String toString() { return "("+e+")"; } public void accept(MinijavaVisitor v) { if (!v.preVisit(this)) return; e.accept(v); v.postVisit(this); } } public static Binex binop(Expr e1, int op, Expr e2){ return new Binex(e1,op,e2); } public static class Binex extends Expr { public Expr e1, e2; public int op; public Binex(Expr e1, int op, Expr e2){ this.e1=e1; this.e2=e2; this.op=op; } public String toString(){ String operator=null; if (op==Constants.PLUS) operator = "+"; if (op==Constants.MINUS) operator = "-"; if (op==Constants.MULT) operator = "*"; if (op==Constants.DIV) operator = "/"; if (op==Constants.MOD) operator = "\\%"; return e1 + ""+operator + e2; } public void accept(MinijavaVisitor v){ if (!v.preVisit(this)) return; e1.accept(v); e2.accept(v); v.postVisit(this); } } public static Unex unop(Expr e) { return new Unex(e); } public static class Unex extends Expr { public Expr e1; public Unex(Expr e1) { this.e1=e1; } public String toString() { return "-"+e1; } public void accept(MinijavaVisitor v) { if (!v.preVisit(this)) return; e1.accept(v); v.postVisit(this); } } public static IntConst intconst(int i) { return new IntConst(i); } public static class IntConst extends Expr { public int i; public IntConst(int i) { this.i=i; } public String toString() { return i+""; } public void accept(MinijavaVisitor v) { v.visit(this); } } public static Identifier ident(Location l, String s, Location r){ return new Identifier(l,s,r); } public static class Identifier extends Expr { public Location left, right; public String i; public Identifier(Location l, String i,Location r){ this.i=i; this.left=l; this.right=r; } public String toString() { return i; } public void accept(MinijavaVisitor v) { v.visit(this); } } }
The visitors for this structure then are able to walk the syntax tree and compute attributes; the following visitor implements the code generation for the MiniJVM:
import minijava.*; import java.util.*; import java.io.*; public class MiniJVMGenerator extends MinijavaVisitor implements Constants{ PrintWriter writer; public MiniJVMGenerator(String filename){ try{ writer = new PrintWriter(filename+".jvm"); }catch(IOException ioe){ ioe.printStackTrace(); } } public void flush(){ writer.flush(); writer.close(); } int declcounter = 0; int loopcounter = 0; int ifcounter = 0; Map<String,Integer> symtab = new HashMap<String,Integer>(); public void postVisit(Program p){ writer.println("HALT"); } public boolean preVisit(Stmt.Assign i){ writer.print("// "+i); return true; } public void postVisit(Stmt.Assign i){ writer.println("STORE "+symtab.get(i.lhs)); } public boolean preVisit(Stmt.Write i){ writer.print("// "+i); return true; } public void postVisit(Stmt.Write i){ writer.println("WRITE"); } public void visit(Stmt.Read i){ writer.print("// "+i); writer.println("READ"); writer.println("STORE "+symtab.get(i.lhs)); } public boolean preVisit(Stmt.IfThenElse i){ int e = ifcounter++; int end = ifcounter++; writer.println("// "+i.cond); i.cond.accept(this); writer.println("FJUMP IF"+e); i.then.accept(this); if (i.els!=null) writer.println("JUMP IF"+end); writer.print("IF"+e+":"); if(i.els!=null) { i.els.accept(this); writer.println("IF"+end+":"); } return false; } public boolean preVisit(Stmt.Loop i){ int loop = loopcounter++; int loopexit = loopcounter++; writer.println("//"+ i.cond+" ??" ); writer.print("L"+ loop +":" ); i.cond.accept(this); writer.println("FJUMP L"+loopexit); i.body.accept(this); writer.println("JUMP L"+loop); writer.print("L"+loopexit+":"); return false; } public boolean preVisit(Decl d){ writer.print("//"+d); return true; } public void postVisit(Decl d){ for (Expr.Identifier s :d.varlist) symtab.put(s.i,declcounter++); writer.println("ALLOC "+d.varlist.size()); } public void postVisit(Cond.BUnOp i){ writer.println("NOT"); } public void postVisit(Cond.BinCond i){ if (i.op==LEQ) writer.println("LEQ"); if (i.op==LE) writer.println("LESS"); if (i.op==GTQ) writer.println("GTQ"); if (i.op==GT) writer.println("GT"); if (i.op==EQ) writer.println("EQ"); if (i.op==NEQ) writer.println("NEQ"); } public void postVisit(Cond.BBinCond i){ if (i.op==AND) writer.println("AND"); if (i.op==OR) writer.println("OR"); } public void visit(Cond.BoolConst d){ writer.println("???"); } public void visit(Expr.Identifier d){ writer.println("LOAD "+symtab.get(d.i)); } public void visit(Expr.IntConst d){ writer.println("CONST "+d.i); } public void postVisit(Expr.Binex i){ if (i.op==PLUS) writer.println("ADD"); if (i.op==MINUS) writer.println("SUB"); if (i.op==MULT) writer.println("MUL"); if (i.op==DIV) writer.println("DIV"); if (i.op==MOD) writer.println("MOD"); } public void postVisit(Expr.Unex i){ writer.println("NEG"); } }
Parsing C is somewhat challenging, as its grammar is not context free. This is due to C's mechanism of defining type keywords during a program via typedefs. However, with a few lines of user defined action code, we can communicate the introduction of type keywords during the parsing process back to the scanner, while keeping track of scopes from inside the parser.
import java.io.*; import java.util.*; import java_cup.runtime.*; import java_cup.runtime.XMLElement.*; import javax.xml.stream.*; import javax.xml.transform.*; import javax.xml.transform.stream.*; parser code {: public void syntax_error(Symbol cur_token){ System.err.println("Syntax error at "+cur_token); } public static void newScope(){ typenames.push(new HashSet<String>()); } public static void deleteScope(){ typenames.pop(); } public static boolean lookupType(String name){ for (HashSet<String> scope: typenames) if (scope.contains(name)) return true; return false; } public static void addType(String name){ typenames.peek().add(name); } public static LinkedList<HashSet<String>> typenames = new LinkedList<HashSet<String>>(); public Parser(Lexer lex, ComplexSymbolFactory sf) { super(lex,sf); } public static void main(String args[]) { try { ComplexSymbolFactory csf = new ComplexSymbolFactory(); // create a buffering scanner wrapper ScannerBuffer lexer = new ScannerBuffer(new Lexer(new BufferedReader(new FileReader(args[0])),csf)); // start parsing Parser p = new Parser(lexer,csf); System.out.println("Parser runs: "); newScope(); XMLElement e = (XMLElement)p.parse().value; // create XML output file XMLOutputFactory outFactory = XMLOutputFactory.newInstance(); XMLStreamWriter sw = outFactory.createXMLStreamWriter(new FileOutputStream(args[1])); // dump XML output to the file XMLElement.dump(lexer,sw,e); //,"expr","stmt"); // transform the parse tree into an AST and a rendered HTML version Transformer transformer = TransformerFactory.newInstance() .newTransformer(new StreamSource(new File("tree.xsl"))); Source text = new StreamSource(new File(args[1])); transformer.transform(text, new StreamResult(new File("output.html"))); System.out.println("Parsing finished!"); } catch (Exception e) { e.printStackTrace(); } } :}; terminal IDENTIFIER, CONSTANT, STRING_LITERAL, SIZEOF, PTR_OP, INC_OP, DEC_OP, LEFT_OP, RIGHT_OP, LE_OP, GE_OP, AND_OP, OR_OP, MUL_ASSIGN, DIV_ASSIGN, MOD_ASSIGN, ADD_ASSIGN, SUB_ASSIGN, LEFT_ASSIGN, RIGHT_ASSIGN, XOR_ASSIGN, OR_ASSIGN, TYPE_NAME, TYPEDEF, EXTERN, STATIC, AUTO, REGISTER, CHAR, SHORT, INT, LONG, SIGNED, UNSIGNED, FLOAT, DOUBLE, CONST, VOLATILE, VOID, STRUCT, UNION, ENUM, ELLIPSIS, CASE, DEFAULT, IF, ELSE, SWITCH, WHILE, DO, FOR, GOTO, CONTINUE, BREAK, RETURN, SEMI, CURLYL, CURLYR, COMMA, COLON, ASSIGN, PARAL, PARAR, SQUAREDL, SQUAREDR, POINT, ADRESS, NOT, TILDE, AND_ASSIGN, EQ_OP, NE_OP, MINUS, PLUS, MUL, DIVIDE, MODULUS, LESS, GREATER, XOR, OR, COND; non terminal translation_unit, primary_expression, postfix_expression, expression, assignment_expression; non terminal unary_operator, type_name, cast_expression, multiplicative_expression, additive_expression; non terminal shift_expression, equality_expression, and_expression, exclusive_or_expression; non terminal logical_and_expression, logical_or_expression, conditional_expression, constant_expression; non terminal declaration_specifiers, init_declarator_list, storage_class_specifier, type_specifier, type_qualifier; non terminal init_declarator, declarator, struct_or_union_specifier, struct_declaration_list, struct_declaration; non terminal initializer, specifier_qualifier_list, struct_declarator_list, struct_declarator, enum_specifier; non terminal enumerator_list, enumerator, pointer, direct_declarator, parameter_type_list, identifier_list; non terminal type_qualifier_list, parameter_declaration, abstract_declarator, direct_abstract_declarator; non terminal initializer_list, statement, labeled_statement, compound_statement, selection_statement; non terminal jump_statement, statement_list, expression_statement, external_declaration, function_definition; non terminal assignment_operator, parameter_list, unary_expression, iteration_statement, declaration_list; non terminal relational_expression, inclusive_or_expression, declaration; precedence nonassoc ELSE; start with translation_unit; primary_expression ::= IDENTIFIER:ident | CONSTANT:constant | STRING_LITERAL:stringliteral | PARAL expression:e PARAR ; postfix_expression ::= primary_expression:pe | postfix_expression:pe SQUAREDL expression:index SQUAREDR | postfix_expression:pe PARAL PARAR | postfix_expression:pe PARAL expression:e PARAR | postfix_expression:pe POINT IDENTIFIER:id | postfix_expression:pe PTR_OP IDENTIFIER:id | postfix_expression:pe INC_OP:op | postfix_expression:pe DEC_OP:op ; unary_expression ::= postfix_expression:pe | INC_OP:op unary_expression:ue | DEC_OP:op unary_expression:ue | unary_operator:uo cast_expression:ce | SIZEOF unary_expression:ue | SIZEOF PARAL type_name:tn PARAR ; unary_operator ::=ADRESS | MUL:op | PLUS:op | MINUS:op | TILDE | NOT:op ; cast_expression ::=unary_expression:ue | PARAL type_name:tn PARAR cast_expression:ce ; multiplicative_expression ::= cast_expression:ce | multiplicative_expression:me MUL:op cast_expression:ce | multiplicative_expression:me DIVIDE:op cast_expression:ce | multiplicative_expression:me MODULUS:op cast_expression:ce ; additive_expression ::= multiplicative_expression:me | additive_expression:ae PLUS:op multiplicative_expression:me | additive_expression:ae MINUS:op multiplicative_expression:me ; shift_expression ::= additive_expression:ae | shift_expression:se LEFT_OP additive_expression:ae | shift_expression:se RIGHT_OP additive_expression:ae ; relational_expression ::= shift_expression:se | relational_expression:re LESS:op shift_expression:se | relational_expression:re GREATER:op shift_expression:se | relational_expression:re LE_OP:op shift_expression:se | relational_expression:re GE_OP:op shift_expression:se ; equality_expression ::= relational_expression:re | equality_expression:ee EQ_OP:op relational_expression:re | equality_expression:ee NE_OP:op relational_expression:re ; and_expression ::= equality_expression:ee | and_expression:ae ADRESS equality_expression:ee ; exclusive_or_expression ::= and_expression:ae | exclusive_or_expression:eoe XOR and_expression:ae ; inclusive_or_expression ::= exclusive_or_expression:eoe | inclusive_or_expression:ioe OR exclusive_or_expression:eoe ; logical_and_expression ::= inclusive_or_expression:ioe | logical_and_expression:lae AND_OP:op inclusive_or_expression:ioe ; logical_or_expression ::= logical_and_expression:lae | logical_or_expression:loe OR_OP:op logical_and_expression:lae ; conditional_expression ::= logical_or_expression:loe | logical_or_expression:loe COND expression:e COLON conditional_expression:ce ; assignment_expression ::= conditional_expression:ce | unary_expression:ue assignment_operator:aop assignment_expression:ae ; assignment_operator ::=ASSIGN | MUL_ASSIGN | DIV_ASSIGN | MOD_ASSIGN | ADD_ASSIGN | SUB_ASSIGN | LEFT_ASSIGN | RIGHT_ASSIGN | AND_ASSIGN | XOR_ASSIGN | OR_ASSIGN ; expression ::= assignment_expression:ae | expression:e COMMA assignment_expression:ae ; constant_expression ::=conditional_expression:ce ; declaration ::=declaration_specifiers:ds SEMI | declaration_specifiers:ds init_declarator_list:idl {: if (ds.toString().indexOf(">typedef<")>0) { for (XMLElement e: ((XMLElement)idl).selectById("identifier")) Parser.addType(((Terminal)e).value().toString()); } :} SEMI ; declaration_specifiers ::=storage_class_specifier:scc | storage_class_specifier:scc declaration_specifiers:ds | type_specifier:ts | type_specifier:ts declaration_specifiers:ds | type_qualifier:tq | type_qualifier:tq declaration_specifiers:ds ; init_declarator_list ::=init_declarator:id | init_declarator_list:idl COMMA init_declarator:id ; init_declarator ::=declarator:d | declarator:d ASSIGN initializer:i ; storage_class_specifier ::= TYPEDEF:id | EXTERN:id | STATIC:id | AUTO:id | REGISTER:id ; type_specifier ::=VOID:type | CHAR:type | SHORT:type | INT:type | LONG:type | FLOAT:type | DOUBLE:type | SIGNED:type | UNSIGNED:type | struct_or_union_specifier:su | enum_specifier:es | TYPE_NAME:type ; struct_or_union_specifier ::= STRUCT:s IDENTIFIER:id CURLYL struct_declaration_list:sdl CURLYR | STRUCT:s CURLYL struct_declaration_list:sdl CURLYR | STRUCT:s IDENTIFIER:id | UNION:u IDENTIFIER:id CURLYL struct_declaration_list:sdl CURLYR | UNION:u CURLYL struct_declaration_list:sdl CURLYR | UNION:u IDENTIFIER:id ; struct_declaration_list ::=struct_declaration:s | struct_declaration_list:sl struct_declaration:s ; struct_declaration ::=specifier_qualifier_list:sq struct_declarator_list:sd SEMI ; specifier_qualifier_list ::=type_specifier:ts specifier_qualifier_list:sq | type_specifier:ts | type_qualifier:tq specifier_qualifier_list:sq | type_qualifier:tq ; struct_declarator_list ::=struct_declarator:s | struct_declarator_list:sl COMMA struct_declarator:s ; struct_declarator ::=declarator:d | COLON constant_expression:ce | declarator:d COLON constant_expression:ce ; enum_specifier ::=ENUM CURLYL enumerator_list:el CURLYR | ENUM IDENTIFIER:id CURLYL enumerator_list:el CURLYR | ENUM IDENTIFIER:id ; enumerator_list ::=enumerator:e | enumerator_list:el COMMA enumerator:e ; enumerator ::=IDENTIFIER:id | IDENTIFIER:id ASSIGN constant_expression:ce ; type_qualifier ::=CONST:id | VOLATILE:id ; declarator ::=pointer:p direct_declarator:direct | direct_declarator:direct ; direct_declarator ::=IDENTIFIER:identifier | PARAL declarator:d PARAR | direct_declarator:dd SQUAREDL constant_expression:ce SQUAREDR | direct_declarator:dd SQUAREDL SQUAREDR | direct_declarator:dd PARAL parameter_type_list:ptl PARAR | direct_declarator:dd PARAL identifier_list:il PARAR | direct_declarator:dd PARAL PARAR ; pointer ::=MUL:id | MUL:id type_qualifier_list:tql | MUL:id pointer:p | MUL:id type_qualifier_list:tql pointer:p ; type_qualifier_list ::=type_qualifier:tq | type_qualifier_list:tql type_qualifier:tq ; parameter_type_list ::=parameter_list:pl | parameter_list:pl COMMA ELLIPSIS:id ; parameter_list ::=parameter_declaration:pd | parameter_list:pl COMMA parameter_declaration:pd ; parameter_declaration ::=declaration_specifiers:ds declarator:d | declaration_specifiers:ds abstract_declarator:ad | declaration_specifiers:ds ; identifier_list ::=IDENTIFIER:id | identifier_list:idl COMMA IDENTIFIER:id ; type_name ::=specifier_qualifier_list:sl | specifier_qualifier_list:sl abstract_declarator:ad ; abstract_declarator ::=pointer:p | direct_abstract_declarator:dad | pointer:p direct_abstract_declarator:d ; direct_abstract_declarator ::=PARAL:id abstract_declarator:ad PARAR | SQUAREDL:id SQUAREDR | SQUAREDL:id constant_expression:ce SQUAREDR | direct_abstract_declarator:dad SQUAREDL:id SQUAREDR | direct_abstract_declarator:dad SQUAREDL:id constant_expression:ce SQUAREDR | PARAL:id PARAR | PARAL:id parameter_type_list:ptl PARAR | direct_abstract_declarator:dad PARAL:id PARAR | direct_abstract_declarator:dad PARAL:id parameter_type_list:ptl PARAR ; initializer ::=assignment_expression:ae | CURLYL initializer_list:il CURLYR | CURLYL initializer_list:il COMMA CURLYR ; initializer_list ::=initializer:i | initializer_list:il COMMA initializer:i ; statement ::=labeled_statement:ls | {: Parser.newScope(); :} compound_statement:cs {: Parser.deleteScope(); :} | expression_statement:es | selection_statement:ss | iteration_statement:is | jump_statement:js ; labeled_statement ::=IDENTIFIER:id COLON statement:s | CASE constant_expression:ce COLON statement:s | DEFAULT COLON statement:s ; compound_statement ::=CURLYL CURLYR | CURLYL statement_list:sl CURLYR | CURLYL declaration_list:dl CURLYR | CURLYL declaration_list:dl statement_list:sl CURLYR ; declaration_list ::=declaration:d | declaration_list:dl declaration:d ; statement_list ::=statement:s | statement_list:sl statement:s ; expression_statement ::=SEMI | expression:e SEMI ; selection_statement ::=IF PARAL expression:e PARAR statement:s | IF PARAL expression:e PARAR statement:s1 ELSE statement:s2 | SWITCH PARAL expression:e PARAR statement:s ; iteration_statement ::=WHILE PARAL expression:e PARAR statement:s | DO statement:s WHILE PARAL expression:e PARAR SEMI | FOR PARAL expression_statement:es1 expression_statement:es2 PARAR statement:s | FOR PARAL expression_statement:es1 expression_statement:es2 expression:e PARAR statement:s ; jump_statement ::=GOTO IDENTIFIER:id SEMI | CONTINUE SEMI | BREAK SEMI | RETURN SEMI | RETURN expression:e SEMI ; translation_unit ::=external_declaration:ed | translation_unit:tu external_declaration:ed ; external_declaration ::=function_definition:fd | declaration:d ; function_definition ::=declaration_specifiers:ds declarator:d declaration_list:dl {: Parser.newScope(); :} compound_statement:cs {: Parser.deleteScope(); :} | declaration_specifiers:ds declarator:d {: Parser.newScope(); :} compound_statement:cs {: Parser.deleteScope(); :} | declarator:d declaration_list:dl {: Parser.newScope(); :} compound_statement:cs {: Parser.deleteScope(); :} | declarator:d {: Parser.newScope(); :} compound_statement:cs {: Parser.deleteScope(); :} ;