CUP 0.11b

CUP stands for Construction of Useful Parsers and is an LALR parser generator for Java. It was developed by C. Scott Ananian, Frank Flannery, Dan Wang, Andrew W. Appel and Michael Petter. It implements standard LALR(1) parser generation. As a parser author, you specify the symbols of Your grammar (terminal T1,T2; non terminal N1, N2;), as well as the productions (LHS :== RHS1 | RHS2 ;). If you provide each production alternative with action code ({: RESULT = myfunction(); :}), the parser will call this action code after performing a reduction with the particular production. You can use these callbacks to assemble an AST (Abstract Syntax Tree) or for arbitrary purposes. You should also have a look at the scanner generator JFlex, which is suited particularly well for collaboration with CUP.

Download

select the version of CUP, you would like to obtain:

FAQ

How do I program CUP?

CUP generates parsers from specifications, that you provide in a special file, whose syntax is quite similar to YACC:

/* Simple +/-/* expression language; parser evaluates constant expressions on the fly*/
import java_cup.runtime.*;

/* Terminals (tokens returned by the scanner). */
terminal            SEMI, PLUS, MINUS, TIMES, UMINUS, LPAREN, RPAREN;
terminal Integer    NUMBER;        // our scanner provides numbers as integers

/* Non terminals */
non terminal            expr_list;
non terminal Integer    expr;      // used to store evaluated subexpressions

/* Precedences */
precedence left PLUS, MINUS;
precedence left TIMES;
precedence left UMINUS;

/* The grammar rules */
expr_list ::= expr_list expr SEMI
            | expr:e SEMI;                  {: System.out.println(e);:}
expr      ::= expr:e1 PLUS  expr:e2         {: RESULT = e1+e2;       :}
             | expr:e1 MINUS expr:e2        {: RESULT = e1-e2;       :}
             | expr:e1 TIMES expr:e2        {: RESULT = e1*e2;       :}
             | MINUS expr:e %prec UMINUS    {: RESULT = -e;          :}
       | LPAREN expr:e RPAREN	         {: RESULT = e;           :}
       | NUMBER:n	                     {: RESULT = n;           :}
             ;

This will produce two files, parser.java and sym.java, which can be included in arbitrary Java projects.

I do not have fancy callback actions. Do I really have to code all that redundant syntax tree creation myself?

CUP since the 20140703 release has the right answer for you! Use the -xmlactions, and possibly the -genericlabels parameters during generation. This makes CUP generate a parser, that generates an XML representation of its parse tree. Consider this example:

/* Minijava Grammar */
import java_cup.runtime.ComplexSymbolFactory;
import java_cup.runtime.ScannerBuffer;
import java_cup.runtime.XMLElement;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamWriter;

import javax.xml.transform.*;
import javax.xml.transform.stream.*;

import java.io.*;
parser code {:
  public Parser(Lexer lex, ComplexSymbolFactory sf) {
    super(lex,sf);
  }
  public static void main(String[] args) throws Exception {
      // initialize the symbol factory
      ComplexSymbolFactory csf = new ComplexSymbolFactory();
      // create a buffering scanner wrapper
      ScannerBuffer lexer = new ScannerBuffer(new Lexer(new BufferedReader(new FileReader(args[0])),csf));
      // start parsing
      Parser p = new Parser(lexer,csf);
      XMLElement e = (XMLElement)p.parse().value;
      // create XML output file 
      XMLOutputFactory outFactory = XMLOutputFactory.newInstance();
      XMLStreamWriter sw = outFactory.createXMLStreamWriter(new FileOutputStream(args[1]));
      // dump XML output to the file
      XMLElement.dump(lexer,sw,e,"expr","stmt");

      // transform the parse tree into an AST and a rendered HTML version
      Transformer transformer = TransformerFactory.newInstance()
	    .newTransformer(new StreamSource(new File("tree.xsl")));
      Source text = new StreamSource(new File(args[1]));
      transformer.transform(text, new StreamResult(new File("output.html")));
  }
:};



/* Terminals */
terminal         SEMICOLON, COMMA, LPAR, RPAR, BEGIN, END, IF, ELSE, WHILE, READ, WRITE, BUNOP, ASSIGN;

terminal Integer TYPE, BINOP, UNOP, COMP, BBINOP, INTCONST;
terminal String  IDENT,STRINGCONST;
terminal Boolean BOOLCONST;

non terminal program, decllist,decl,stmtlist,identlist,stmt,expr,cond;

precedence left ELSE, UNOP, BINOP, BUNOP, BBINOP;

program   ::=  decllist stmtlist
    ;
decllist  ::=  decl decllist
    |
    ;
stmtlist  ::= stmtlist stmt
    |
        ;
decl ::= TYPE IDENT identlist  SEMICOLON
    ;
identlist  ::= identlist COMMA IDENT
    |
    ;
stmt ::= SEMICOLON
    | BEGIN stmtlist END
    | IDENT ASSIGN expr SEMICOLON
    | IDENT ASSIGN READ LPAR RPAR SEMICOLON
    | IDENT ASSIGN READ LPAR STRINGCONST RPAR SEMICOLON
    | WRITE LPAR expr RPAR SEMICOLON
    | WRITE LPAR STRINGCONST RPAR SEMICOLON
    | IF LPAR cond RPAR stmt
    | IF LPAR cond RPAR stmt ELSE stmt
    | WHILE LPAR cond RPAR stmt
    ;
cond ::= BOOLCONST
    | LPAR cond RPAR
    | expr COMP expr
    | BUNOP cond
    | cond BBINOP cond
    ;
expr ::= IDENT
    | INTCONST
    | LPAR expr RPAR
    | BINOP expr
    | expr BINOP expr
    ;

Runs of the generated parser create parse trees in XML format, that can be further processed on any platform you like, be it Java, .NET, XSLT or XQuery. Output looks like the one in this XML file. Syntax tree transformation then essentially boils down to XML transformation. Flattening recursive list productions and collapsing chain productions are 3-liners in XSLT... With the help of a little XSLT-magic, we get a nice visualization of the parse tree via HTML/CSS. Let us consider the input from here:

 int x, y;
 y = 10;
 x = read();
 while (1-y <= -x) {
     y = x * (-y + 5);
 }

Together with a suitable JFlex-based scanner, the parser from above will process this input into the following abstract syntax tree:

Resorting to a manual assignment of labels to the symbols instead of the automatically generated ones from the default CUP execution gives you even more compact and abstract representation of your input files. Also, an XSL-transformation that kills all the dataless terminals might give you already the desired layer of abstraction.

How do I integrate CUP into my project?

CUP comes with an ANT task, so you can integrate it with any Java environment, that supports build.xml based project files. CUP needs to generate sources, that have to be integrated into the main project sources, before compilation can succeed. This can be achieved, by introducing a target for generated sources, on which the main compilation target depends. Additionally, you should consider to clean up your generated sources, whenever the project itself is cleaned. Basic support for this is sketched in the following build script. You can also see, how the scanner generator JFlex is integrated into the same build.

<project    name="Compiler" default ="compile" basedir=".">
  <property name="cup"      location="src/cup"      />
  <property name="jflex"    location="src/jflex"    />
  <property name="java"     location="src/java"     />
  <property name="classes"  location="bin"          />
  <property name="lib"      location="lib"          />
  <property name="tools"    location="tools"        />

  <taskdef  name="jflex" classname="JFlex.anttask.JFlexTask"   classpath="${tools}/JFlex.jar"   />
  <taskdef  name="cup"   classname="java_cup.anttask.CUPTask"  classpath="${tools}/java-cup-11b.jar"  />

  <target name="generate">
    <jflex  file="${jflex}/Scanner.jflex" destdir="${java}" />
    <cup srcfile="${cup}/Parser.cup"      destdir="${java}"
          parser="Parser"                 interface="true" locations="false" />
  </target>


  <path id="libraries"> <files includes="${lib}/java-cup-11b-runtime.jar" /> </path>

  <target name="compile" depends="generate">
    <javac srcdir="${java}" destdir="${classes}" > <classpath refid="libraries" /> </javac>
  </target>

  <target name="clean">
    <delete file="${java}/Parser.java" />
    <delete file="${java}/sym.java" />
    <delete file="${java}/Scanner.java" />
    <delete dir="${classes}" />
  </target>
</project>

We have also prepared a very minimalistic project, to be used as a starting ground for your own project; you can get it from here!

Seriously, I still have a lot of questions

You can find more detailed information in CUP's manual:

Take me to the documentation!

Can I use CUP in my project?

As long as the copyright notice appears in the distributed product, you can use CUP:

I found an error - what do I do?

Drop a patch to Michael Petter - if it is interesting enough, he will integrate it into the main sources.

Older versions

CUP is now nearly 20 years old now, and a lot of versions have been out there since then. You can find the old versions here:
Browse old versions

Source code

Get the source code: svn checkout https://www2.in.tum.de/repos/cup/develop