You're seeing this message because you're using an older version of Internet Explorer that is unsupported on our website. Please use these links to upgrade to a modern web browser that fully supports our website and protects your computer from security risks.
ICEä and ICE/Tä:
Tools to Assist in Compiler Design and Implementation
Truman Parks Boyer, Mohsen Chitsaz
Department of Computer Science
ICE (Intermediate Code Engine) and ICE/T (ICE/Translator) are compiler back ends that execute on a Java Virtual Machine (JVM). Both are intended to facilitate rapid prototyping of a project compiler, and can execute on any platform that supplies a JVM. They provide a virtual target machine that is comprehensible, robust, and easy to use; designed with the novice compiler writer in mind. ICE is a quadruple interpreter that executes ICE code directly, and includes an assembler, which a builder can use to side-step most symbol management issues. ICE/T is a translator that accepts ICE assembly code as input, and generates an equivalent Java class file as output. These tools are compiler back ends that reach as far “forward” as possible, allowing the student to complete a working compiler more quickly. As a result, the overall compiler implementation project becomes easier to understand and manage. This paper advocates the use of these tools in compiler implementation courses.
Far too often, compiler implementation courses fall short of completing a working compiler. As a result, the student misses some of the insights that can be had by tinkering with a working model. Much of their understanding of compiler functionality is only “book learned” and therefore likely to be forgotten more easily. Knowledge retention is far more likely when the student has actually faced and overcome design issues and challenges.
Compiler implementation is a time-consuming endeavor. For the compiler course, this problem is exacerbated by a semester time frame normally imposed. One of the more challenging aspects of teaching a compiler course is the necessity of teaching students to understand the instruction set architecture of a target machine. This problem is compounded by the fact that undergraduates will have had only limited exposure to assembly language programming. This is often alleviated to some extent by the use of a virtual machine as a target, but virtual machines are often stack-based [1,2]. Many virtual machines also use a complex input file design, and are difficult to install and operate.
Most often, undergraduate compiler courses emphasize the “front end” issues of lexical and syntactic analysis. The “back end” subjects of intermediate code generation, optimization, and object module creation are generally thought to be less significant to a student’s understanding of the compiler building process [3,4] and also there may not be enough time to cover the back end in a semester. This paper seeks to explain how a well-documented, simplified back end can be used to augment the understanding of these subjects, and facilitate the completion of a working compiler. This might even allow the students extra time to tackle more advanced compiler features.
We have been using ICE for the past two years with success in our compiler course. All of the software discussed in this article has been implemented in Java and it may be executed in stand-alone mode on a Java Virtual Machine (JVM). Of course, the project compiler can be written in any language. Web-based applet versions are being developed at the time of this writing. Interested readers can obtain more current information and the complete documentation of the ICE machine by accessing the Frostburg University Web Site at http://faculty.frostburg.edu/chitsaz/ice.htm
ICE Machine Description and Design Rationale
The ICE machine will be described in terms of its instruction set architecture. One of the primary goals of the ICE machine was to be easy to understand and use, so some attempt was made to analyze skill sets of the intended student audience. Since ICE was not bound by the constraints of physical hardware, it was decided that it would use a single addressing mode, and the instruction set would be entirely memory-to-memory. To avoid the problem of code/data collisions, code would be loaded into a code-only memory area, and data would occupy all of “RAM”. All data types are stored in 32 bit words, and I/O operations are accomplished through simple “system calls”. Specialized system calls are provided that support character string I/O for both Pascal and C-type strings. Figure 1 shows the ICE execution scheme.
Figure 1: ICE Execution Scheme
Perhaps student understanding could be greatly enhanced by rapidly producing a prototype compiler. It might have a very limited grammar at first. Functionality can be improved upon later. One goal of the ICE project was to see how far forward a back end could reach in order to facilitate rapid prototyping. A suggested approach to teaching compiler implementation using ICE would be to first implement the simple I/O and arithmetic statements of the grammar, with one statement per line. The statement structure should be designed so that each line of source code can be translated into exactly one line of ICE assembly code. Since this prototype compiler would merely translate source code “line by line” into ICE assembly instructions, the student could leave symbolic names intact and allow the assembler to resolve addresses. This prototype compiler would create a foundation for further development, as well as familiarize the student with the basic workings of the ICE machine. A second phase would be to build rest of the grammar. This would include a statement type for every assembly instruction in the ICE repertoire, except for sub-programming instructions. This grammar might also provide support for multiple data types. By the time this second phase is accomplished, the student should be quite comfortable with ICE, and be gaining an understanding of how high-level language constructs are built from machine instructions. The next improvement would be to add support for subprograms, and the student would then have a full understanding of the workings of ICE, and a thorough knowledge of how to map machine instructions to a high-level language. As the compiler course introduces more complex parsing concepts, improvements to the compiler would include:
-drop the “one statement per line” restriction
-tightening up the grammar by consolidating statement types
-adding statement types that generate more than one assembly instruction
By this time, the student would have a functioning compiler for a fairly robust language, but the compiler would not do its own symbol management. Final improvements might be to add a symbol table mechanism and have the compiler generate ICE machine code directly.
A further development goal for the ICE project was to develop an optional back-end that would translate ICE machine code directly into Java byte codes. Final output would be a Java class file, thus allowing the target compiler to generate code that could be executed on any platform with a JVM. Using ICE as a tool for compiler development and testing, ICE/T would be the final phase of the project. ICE/T could conceivably be used to implement any language on any platform. Figure 2 outlines the ICE execution scheme.
Figure 2: ICE/T Execution
To design a compiler that targets the Java Virtual Machine as a back end without any intermediate steps, would require detailed knowledge of the Java class file format. ICE/T was designed to eliminate any such requirement. ICE machine code can be translated directly to class file format without any Java-specific knowledge.
A Compiler Project using ICE
An example compiler project is discussed in moderate detail here. Complete details are provided on the website listed above. The philosophy of this development scheme is that ICE reaches as far “forward” as it can, and the target grammar should initially reach as far “backward” as it can. The grammar would initially be just one step above assembly language, and be gradually evolved into a more high level language. Language features would at first be limited to simple I/O and arithmetic. Later, logic and control structures could be included. Subprogramming facilities would be added if time permits.
· The preliminary grammar is designed to be translated on a line-for-line basis, from source code to ICE assembly code. It uses only a single 32 bit integer data type. Only one parameter should be used with Read and write statements. The Write statement also allows an integer literal. Arithmetic statements all take the form A = B op C , where op is +, -, *, or / and division is integer mode, note that only one operation is allowed per statement. A stop statement ends program execution. This development phase will get the students comfortable with the ICE execution scheme and instruction set, as well as an understanding of the overall workings of a compiler.
· The next phase provides I/O statements for integer, character, and boolean data types. Note that the ICE machine stores all data types as 32-bit words, and provides service call instructions that permit any memory word to be inputted or outputted as any data type. Arithmetic is still done in an integer mode no matter what data types the operands might actually be. This phase will introduce the students to the I/O modes that ICE supports.
· The third phase introduces declaration statements for the data types: integer, boolean, and character. The specialized read and write statements for each data type are eliminated, forcing the student compiler to track data types. I/O lists are allowed, this removes the “line-for-line” translation restriction. Arithmetic assignment statements now include multiple operations per line. In both arithmetic and I/O lists, both variables and literals may be included.
· Grammar adds support for character string I/O, and support for arrays and array elements within arithmetic and I/O lists.
· Compound boolean assignment statements were included with and, or, not, and xor operations supported.
· Conditional expressions and control flow instructions were added.
· Next, support for subprograms was included. This might be implemented in several sub-phases.
· By this time, a nearly complete programming language was supported, so the remaining step was to add a full symbol table to the compiler, and have it generate ICE quadruples directly.
1. Diehl, S. and Hartel, P. H. and Sestoft, P. Abstract Machines for Programming Language Implementation
2000, Abstract Available on Internet http://eprints.ecs.soton.ac.uk/archive/00000491/ [Checked 30 Nov 02]
2. Dr. Dobb's Journal. About
VM. CMP Media LLC,
2000, Available on Internet http://www.ddj.com/documents/s=875/ddj0065d/whatisvm.htm [Checked 30 Nov 02]
3. Wirth, Niklaus Compiler Construction. Addison-Wesley, 1996
4. Aho, A.V., and Ullman, J.D. Principles of Compiler Design. Addison-Wesley, 1979
5. Varian, M. VM and the VM Community: Past, Present, and Future. 1997, Available on Internet http://pucc.princeton.edu:80/~melinda/25paper.pdf [Checked 30 Nov 02]
6. Hugue, M. Computer Instruction Set Architectures Overview. 2001, Available on Internet http://www.cs.umd.edu/class/fall1998/cmsc411/projects/jungle/IV.htm [Checked 30 Nov 02]
7. Philip Bagley. Improving problem-oriented language by stratifying it. The Computer Journal Vol. 4, Issue 3, March 1961