ICEä and ICE/Tä:
Tools to Assist in Compiler
Design and Implementation
Truman Parks Boyer, Mohsen
Chitsaz
Department of Computer Science
tboyer@frostburg.edu / MChitsaz@frostburg.edu
Abstract
ICE (Intermediate Code Engine) and ICE/T
(ICE/Translator) are compiler back ends that execute on a Java Virtual Machine
(JVM). Both are intended to facilitate
rapid prototyping of a project compiler, and can execute on any platform that
supplies a JVM. They provide a virtual
target machine that is comprehensible, robust, and easy to use; designed with
the novice compiler writer in mind. ICE
is a quadruple interpreter that executes ICE code directly, and includes an
assembler, which a builder can use to side-step most symbol management issues. ICE/T is a translator that accepts ICE
assembly code as input, and generates an equivalent Java class file as output. These tools are compiler back ends that
reach as far “forward” as possible, allowing the student to complete a working
compiler more quickly. As a result, the
overall compiler implementation project becomes easier to understand and manage. This paper advocates the use of these tools
in compiler implementation courses.
Introduction
Far too often, compiler implementation courses
fall short of completing a working compiler.
As a result, the student misses some of the insights that can be had by
tinkering with a working model. Much of
their understanding of compiler functionality is only “book learned” and
therefore likely to be forgotten more easily.
Knowledge retention is far more likely when the student has actually
faced and overcome design issues and challenges.
Compiler implementation is a time-consuming
endeavor. For the compiler course, this
problem is exacerbated by a semester time frame normally imposed. One of the more challenging aspects of
teaching a compiler course is the necessity of teaching students to understand the
instruction set architecture of a target machine. This problem is compounded by the fact that
undergraduates will have had only limited exposure to assembly language
programming. This
is often alleviated to some extent by the use of a virtual machine as a target,
but virtual machines are often stack-based [1,2]. Many
virtual machines also use a complex input file design, and are difficult to
install and operate.
Most often, undergraduate compiler courses emphasize
the “front end” issues of lexical and syntactic analysis. The “back end” subjects of intermediate code
generation, optimization, and object module creation are generally thought to
be less significant to a student’s understanding of the compiler building
process [3,4] and also there may not be enough time to cover the back end in a
semester. This paper seeks to explain
how a well-documented, simplified back end can be used to augment the
understanding of these subjects, and facilitate the completion of a working
compiler. This might even allow the
students extra time to tackle more advanced compiler features.
We have been using ICE for the past two years
with success in our compiler course. All of the software discussed in this article
has been implemented in Java and it may be executed in stand-alone mode on a
Java Virtual Machine (JVM). Of course,
the project compiler can be written in any language. Web-based applet versions are being
developed at the time of this writing. Interested
readers can obtain more current information and the complete documentation of
the ICE machine by accessing the Frostburg University Web Site at
http://faculty.frostburg.edu/chitsaz/ice.htm
ICE Machine Description and Design Rationale
The ICE machine will be described in terms of
its instruction set architecture. One
of the primary goals of the ICE machine was to be easy to understand and use,
so some attempt was made to analyze skill sets of the intended student audience. Since ICE was not bound by the constraints
of physical hardware, it was decided that it would use a single addressing
mode, and the instruction set would be entirely memory-to-memory. To avoid the problem of code/data
collisions, code would be loaded into a code-only memory area, and data would
occupy all of “RAM”. All data types are
stored in 32 bit words, and I/O operations are accomplished through simple
“system calls”. Specialized system
calls are provided that support character string I/O for both Pascal and C-type
strings. Figure 1 shows the ICE
execution scheme.

Figure
1: ICE
Execution Scheme
Perhaps student understanding could be greatly
enhanced by rapidly producing a prototype compiler. It might have a very limited grammar at
first. Functionality can be improved
upon later. One goal of the ICE project
was to see how far forward a back end could reach in order to facilitate rapid
prototyping. A suggested approach to
teaching compiler implementation using ICE would be to first implement the
simple I/O and arithmetic statements of the grammar, with one statement per line.
The statement structure should be designed so that each line of source
code can be translated into exactly one line of ICE assembly code. Since this prototype compiler would merely
translate source code “line by line” into ICE assembly instructions, the
student could leave symbolic names intact and allow the assembler to resolve
addresses. This prototype compiler
would create a foundation for further development, as well as familiarize the
student with the basic workings of the ICE machine. A second phase would be to build rest of the
grammar. This would include a statement
type for every assembly instruction in the ICE repertoire, except for
sub-programming instructions. This
grammar might also provide support for multiple data types. By the time this second phase is
accomplished, the student should be quite comfortable with ICE, and be gaining
an understanding of how high-level language constructs are built from machine
instructions. The next improvement
would be to add support for subprograms, and the student would then have a full
understanding of the workings of ICE, and a thorough knowledge of how to map
machine instructions to a high-level language.
As the compiler course introduces more complex parsing concepts,
improvements to the compiler would include:
-drop the “one statement per line” restriction
-tightening up the grammar by consolidating
statement types
-adding statement types that generate more
than one assembly instruction
By this time, the student would have a
functioning compiler for a fairly robust language, but the compiler would not
do its own symbol management. Final
improvements might be to add a symbol table mechanism and have the compiler
generate ICE machine code directly.
ICE/TRANSLATOR (ICE/T)
A further development goal for the ICE project was to
develop an optional back-end that would translate ICE machine code directly
into Java byte codes. Final output
would be a Java class file, thus allowing the target compiler to generate code
that could be executed on any platform with a JVM. Using ICE as a tool for compiler development
and testing, ICE/T would be the final phase of the project. ICE/T could conceivably be used to implement
any language on any platform. Figure 2
outlines the ICE execution scheme.
Figure 2: ICE/T Execution
Scheme 
To design a compiler that targets the Java Virtual
Machine as a back end without any intermediate steps, would require detailed
knowledge of the Java class file format.
ICE/T was designed to eliminate any such requirement. ICE machine
code can be translated directly to class file format without any Java-specific
knowledge.
A Compiler Project using ICE
An example compiler project is discussed in moderate
detail here. Complete details are
provided on the website listed above. The
philosophy of this development scheme is that ICE reaches as far “forward” as
it can, and the target grammar should initially reach as far “backward” as it
can. The grammar would initially be
just one step above assembly language, and be gradually evolved into a more
high level language. Language features
would at first be limited to simple I/O and arithmetic. Later, logic and control structures could be
included. Subprogramming facilities
would be added if time permits.
·
The preliminary
grammar is designed to be translated on a line-for-line basis, from source code
to ICE assembly code. It uses only a
single 32 bit integer data type. Only
one parameter should be used with Read
and write statements. The Write
statement also allows an integer literal.
Arithmetic statements all take the form A = B op C , where op is +,
-, *, or / and division is integer mode,
note that only one operation is allowed per statement. A stop statement ends program execution. This development phase will get the students
comfortable with the ICE execution scheme and instruction set, as well as an
understanding of the overall workings of a compiler.
·
The next phase
provides I/O statements for integer, character, and boolean data types. Note that the ICE machine stores all data
types as 32-bit words, and provides service call instructions that permit any
memory word to be inputted or outputted as any data type. Arithmetic is still done in an integer mode
no matter what data types the operands might actually be. This phase will introduce the students to
the I/O modes that ICE supports.
·
The third phase
introduces declaration statements for the data types: integer,
boolean, and character. The specialized read and write statements for each data
type are eliminated, forcing the student compiler to track data types. I/O lists are allowed, this removes the
“line-for-line” translation restriction.
Arithmetic assignment statements now include multiple operations per
line. In both arithmetic and I/O lists,
both variables and literals may be included.
·
Grammar adds
support for character string I/O, and support for arrays and array elements
within arithmetic and I/O lists.
·
Compound boolean
assignment statements were included with and,
or, not, and xor operations supported.
·
Conditional
expressions and control flow instructions were added.
·
Next, support for
subprograms was included. This might be
implemented in several sub-phases.
·
By this time, a
nearly complete programming language was supported, so the remaining step was
to add a full symbol table to the compiler, and have it generate ICE quadruples
directly.
References
1. Diehl,
S. and Hartel, P. H. and Sestoft, P.
Abstract Machines for Programming Language Implementation
2000, Abstract Available
on Internet http://eprints.ecs.soton.ac.uk/archive/00000491/ [Checked 30 Nov 02]
2. Dr. Dobb's Journal. About
VM. CMP Media LLC,
2000, Available on
Internet http://www.ddj.com/documents/s=875/ddj0065d/whatisvm.htm [Checked 30
Nov 02]
3.
Wirth,
Niklaus Compiler Construction. Addison-Wesley, 1996
4.
Aho,
A.V., and Ullman, J.D. Principles of
Compiler Design. Addison-Wesley, 1979
5. Varian, M. VM and the VM Community: Past, Present, and Future. 1997, Available on Internet
http://pucc.princeton.edu:80/~melinda/25paper.pdf [Checked 30 Nov 02]
6. Hugue, M. Computer Instruction Set Architectures Overview. 2001, Available on
Internet http://www.cs.umd.edu/class/fall1998/cmsc411/projects/jungle/IV.htm
[Checked 30 Nov 02]
7. Philip Bagley. Improving problem-oriented language by stratifying it. The Computer
Journal Vol. 4, Issue 3, March 1961