100% found this document useful (4 votes)

17 views

Computer Science and Compiler Design An Introduction with C 1st Edition by Compiler Generators in C ISBNpdf download

The document provides an overview of various textbooks related to computer science and compiler design, including titles by notable authors and their respective ISBNs. It emphasizes the practical approach of the books, aimed at students with a solid foundation in imperative programming, and covers topics such as compiler construction, data structures, and design patterns. Additionally, it includes information about supplementary software and resources for further exploration in the field of compiler design.

Uploaded by

malmirbaza

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (4 votes)

17 views

Computer Science and Compiler Design An Introduction with C 1st Edition by Compiler Generators in C ISBNpdf download

Uploaded by

malmirbaza

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 85

Computer Science and Compiler Design An

Introduction with C 1st Edition by Compiler

Generators in C ISBN download

https://ebookball.com/product/computer-science-and-compiler-
design-an-introduction-with-c-1st-edition-by-compiler-generators-
in-c-isbn-11720/

Instantly Access and Download Textbook at https://ebookball.com

Get Your Digital Files Instantly: PDF, ePub, MOBI and More
Quick Digital Downloads: PDF, ePub, MOBI and Other Formats

Data Structures Algorithms and Applications in C With Microsoft

Compiler 2nd Edition by Sartaj Sahni ISBN 0929306325 9780929306322

https://ebookball.com/product/data-structures-algorithms-and-
applications-in-c-with-microsoft-compiler-2nd-edition-by-sartaj-
sahni-isbn-0929306325-9780929306322-15764/

Data Structures Algorithms and Applications in C++ With Microsoft

Compiler 1st edition by Sartaj Sahni ISBN 007236226X 978-0072362268

https://ebookball.com/product/data-structures-algorithms-and-
applications-in-c-with-microsoft-compiler-1st-edition-by-sartaj-
sahni-isbn-007236226x-978-0072362268-16394/

Data Structures Algorithms and Applications in C++ With Microsoft

Compiler 2nd edition by Michael Goodrich, Roberto Tamassia, David
Mount ISBN B005FHM6X2 978-0470383278

https://ebookball.com/product/data-structures-algorithms-and-
applications-in-c-with-microsoft-compiler-2nd-edition-by-michael-
goodrich-roberto-tamassia-david-mount-
isbn-b005fhm6x2-978-0470383278-16544/

Modern Compiler Design Worldwide Series in Computer Science 1st

Edition by Dick Grune, Henri Bal, Ceriel Jacobs, Koen Langendoen ISBN
0471976970 9780471976974

https://ebookball.com/product/modern-compiler-design-worldwide-
series-in-computer-science-1st-edition-by-dick-grune-henri-bal-
ceriel-jacobs-koen-langendoen-
isbn-0471976970-9780471976974-19834/
An Introduction to Design Patterns in C with Qt 4 1st Edition by Alan
Ezust, Paul Ezust ISBN 0131879057 9780131879058

https://ebookball.com/product/an-introduction-to-design-patterns-
in-c-with-qt-4-1st-edition-by-alan-ezust-paul-ezust-
isbn-0131879057-9780131879058-12664/

Introduction to Design Patterns in C with Qt 2nd Edition by Alan

Ezust, Paul Ezust ISBN 0132851636 9780132851633

https://ebookball.com/product/introduction-to-design-patterns-in-
c-with-qt-2nd-edition-by-alan-ezust-paul-ezust-
isbn-0132851636-9780132851633-14144/

Compiler Construction: An Advanced Course 1st Edition by Bauer FL ISBN

3540069585 9783540069584

https://ebookball.com/product/compiler-construction-an-advanced-
course-1st-edition-by-bauer-fl-
isbn-3540069585-9783540069584-13264/

Developing Software for Symbian OS An Introduction to c in C++ 1st

edition by Steve Babin ISBN 0470018453 978-0470018453

https://ebookball.com/product/developing-software-for-symbian-os-
an-introduction-to-c-in-c-1st-edition-by-steve-babin-
isbn-0470018453-978-0470018453-13366/

Algorithms for Compiler Design 1st edition by Kakde ISBN â€Ž

1584501006 978-1584501008

https://ebookball.com/product/algorithms-for-compiler-design-1st-
edition-by-kakde-isbn-aeurz-1584501006-978-1584501008-17748/
Compilers and Compiler Generators
an introduction with C++
© P.D. Terry, Rhodes University, 1996

p.terry@ru.ac.za

This is a set of Adobe PDF® files of the text of my book "Compilers and Compiler Generators - an
introduction with C++", published in 1997 by International Thomson Computer Press. The original
edition is now out of print, and the copyright has reverted to me.

The book is also available in other formats. The latest versions of the distribution and details of
how to download up-to-date compressed versions of the text and its supporting software and
courseware can be found at http://www.scifac.ru.ac.za/compilers/

The text of the book is Copyright © PD Terry. Although you are free to make use of the material
for academic purposes, the material may not be redistributed without my knowledge or permission.

File List
The 18 chapters of the book are filed as chap01.pdf through chap18.pdf
The 4 appendices to the book are filed as appa.pdf through appd.pdf
The original appendix A of the book is filed as appa0.pdf
The contents of the book is filed as contents.pdf
The preface of the book is filed as preface.pdf
An index for the book is filed as index.pdf. Currently (January 2000) the page numbers refer
to an A4 version in PCL® format available at
http://www.scifac.ru.ac.za/compilers/longpcl.zip. However, software tools like GhostView
may be used to search the files for specific text.
The bibliography for the book is filed as biblio.pdf

Change List
18-October-1999 - Pre-release
12-November-1999 - First official on-line release
16-January-2000 - First release of Postscript version (incorporates minor corrections to
chapter 12)
17-January-2000 - First release of PDF version
Compilers and Compiler Generators © P.D. Terry, 2000

PREFACE
This book has been written to support a practically oriented course in programming language
translation for senior undergraduates in Computer Science. More specifically, it is aimed at students
who are probably quite competent in the art of imperative programming (for example, in C++,
Pascal, or Modula-2), but whose mathematics may be a little weak; students who require only a
solid introduction to the subject, so as to provide them with insight into areas of language design
and implementation, rather than a deluge of theory which they will probably never use again;
students who will enjoy fairly extensive case studies of translators for the sorts of languages with
which they are most familiar; students who need to be made aware of compiler writing tools, and to
come to appreciate and know how to use them. It will hopefully also appeal to a certain class of
hobbyist who wishes to know more about how translators work.

The reader is expected to have a good knowledge of programming in an imperative language and,
preferably, a knowledge of data structures. The book is practically oriented, and the reader who
cannot read and write code will have difficulty following quite a lot of the discussion. However, it
is difficult to imagine that students taking courses in compiler construction will not have that sort of
background!

There are several excellent books already extant in this field. What is intended to distinguish this
one from the others is that it attempts to mix theory and practice in a disciplined way, introducing
the use of attribute grammars and compiler writing tools, at the same time giving a highly practical
and pragmatic development of translators of only moderate size, yet large enough to provide
considerable challenge in the many exercises that are suggested.

Overview

The book starts with a fairly simple overview of the translation process, of the constituent parts of a
compiler, and of the concepts of porting and bootstrapping compilers. This is followed by a chapter
on machine architecture and machine emulation, as later case studies make extensive use of code
generation for emulated machines, a very common strategy in introductory courses. The next
chapter introduces the student to the notions of regular expressions, grammars, BNF and EBNF,
and the value of being able to specify languages concisely and accurately.

Two chapters follow that discuss simple features of assembler language, accompanied by the
development of an assembler/interpreter system which allows not only for very simple assembly,
but also for conditional assembly, macro-assembly, error detection, and so on. Complete code for
such an assembler is presented in a highly modularized form, but with deliberate scope left for
extensions, ranging from the trivial to the extensive.

Three chapters follow on formal syntax theory, parsing, and the manual construction of scanners
and parsers. The usual classifications of grammars and restrictions on practical grammars are
discussed in some detail. The material on parsing is kept to a fairly simple level, but with a
thorough discussion of the necessary conditions for LL(1) parsing. The parsing method treated in
most detail is the method of recursive descent, as is found in many Pascal compilers; LR parsing is
only briefly discussed.
The next chapter is on syntax directed translation, and stresses to the reader the importance and
usefulness of being able to start from a context-free grammar, adding attributes and actions that
allow for the manual or mechanical construction of a program that will handle the system that it
defines. Obvious applications come from the field of translators, but applications in other areas
such as simple database design are also used and suggested.

The next two chapters give a thorough introduction to the use of Coco/R, a compiler generator
based on L- attributed grammars. Besides a discussion of Cocol, the specification language for this
tool, several in-depth case studies are presented, and the reader is given some indication of how
parser generators are themselves constructed.

The next two chapters discuss the construction of a recursive descent compiler for a simple
Pascal-like source language, using both hand-crafted and machine-generated techniques. The
compiler produces pseudo-code for a hypothetical stack-based computer (for which an interpreter
was developed in an earlier chapter). "On the fly" code generation is discussed, as well as the use of
intermediate tree construction.

The last chapters extend the simple language (and its compiler) to allow for procedures and
functions, demonstrate the usual stack-frame approach to storage management, and go on to discuss
the implementation of simple concurrent programming. At all times the student can see how these
are handled by the compiler/interpreter system, which slowly grows in complexity and usefulness
until the final product enables the development of quite sophisticated programs.

The text abounds with suggestions for further exploration, and includes references to more
advanced texts where these can be followed up. Wherever it seems appropriate the opportunity is
taken to make the reader more aware of the strong and weak points in topical imperative languages.
Examples are drawn from several languages, such as Pascal, Modula-2, Oberon, C, C++, Edison
and Ada.

Support software

An earlier version of this text, published by Addison-Wesley in 1986, used Pascal throughout as a
development tool. By that stage Modula-2 had emerged as a language far better suited to serious
programming. A number of discerning teachers and programmers adopted it enthusiastically, and
the material in the present book was originally and successfully developed in Modula-2. More
recently, and especially in the USA, one has witnessed the spectacular rise in popularity of C++,
and so as to reflect this trend, this has been adopted as the main language used in the present text.
Although offering much of value to skilled practitioners, C++ is a complex language. As the aim of
the text is not to focus on intricate C++programming, but compiler construction, the supporting
software has been written to be as clear and as simple as possible. Besides the C++ code, complete
source for all the case studies has also been provided on an accompanying IBM-PC compatible
diskette in Turbo Pascal and Modula-2, so that readers who are proficient programmers in those
languages but only have a reading knowledge of C++ should be able to use the material very
successfully.

Appendix A gives instructions for unpacking the software provided on the diskette and installing it
on a reader’s computer. In the same appendix will be found the addresses of various sites on the
Internet where this software (and other freely available compiler construction software) can be
found in various formats. The software provided on the diskette includes
Emulators for the two virtual machines described in Chapter 4 (one of these is a simple
accumulator based machine, the other is a simple stack based machine).

The one- and two-pass assemblers for the accumulator based machine, discussed in Chapter 6.

A macro assembler for the accumulator-based machine, discussed in Chapter 7.

Three executable versions of the Coco/R compiler generator used in the text and described in
detail in Chapter 12, along with the frame files that it needs. (The three versions produce
Turbo Pascal, Modula-2 or C/C++ compilers)

Complete source code for hand-crafted versions of each of the versions of the Clang compiler
that is developed in a layered way in Chapters 14 through 18. This highly modularized code
comes with an "on the fly" code generator, and also with an alternative code generator that
builds and then walks a tree representation of the intermediate code.

Cocol grammars and support modules for the numerous case studies throughout the book that
use Coco/R. These include grammars for each of the versions of the Clang compiler.

A program for investigating the construction of minimal perfect hash functions (as discussed
in Chapter 14).

A simple demonstration of an LR parser (as discussed in Chapter 10).

Use as a course text

The book can be used for courses of various lengths. By choosing a selection of topics it could be
used on courses as short as 5-6 weeks (say 15-20 hours of lectures and 6 lab sessions). It could also
be used to support longer and more intensive courses. In our university, selected parts of the
material have been successfully used for several years in a course of about 35 - 40 hours of lectures
with strictly controlled and structured, related laboratory work, given to students in a pre-Honours
year. During that time the course has evolved significantly, from one in which theory and formal
specification played a very low key, to the present stage where students have come to appreciate the
use of specification and syntax-directed compiler-writing systems as very powerful and useful tools
in their armoury.

It is hoped that instructors can select material from the text so as to suit courses tailored to their
own interests, and to their students’ capabilities. The core of the theoretical material is to be found
in Chapters 1, 2, 5, 8, 9, 10 and 11, and it is suggested that this material should form part of any
course based on the book. Restricting the selection of material to those chapters would deny the
student the very important opportunity to see the material in practice, and at least a partial selection
of the material in the practically oriented chapters should be studied. However, that part of the
material in Chapter 4 on the accumulator-based machine, and Chapters 6 and 7 on writing
assemblers for this machine could be omitted without any loss of continuity. The development of
the small Clang compiler in Chapters 14 through 18 is handled in a way that allows for the later
sections of Chapter 15, and for Chapters 16 through 18 to be omitted if time is short. A very wide
variety of laboratory exercises can be selected from those suggested as exercises, providing the
students with both a challenge, and a feeling of satisfaction when they rise to meet that challenge.
Several of these exercises are based on the idea of developing a small compiler for a language
similar to the one discussed in detail in the text. Development of such a compiler could rely entirely
on traditional hand-crafted techniques, or could rely entirely on a tool-based approach (both
approaches have been successfully used at our university). If a hand-crafted approach were used,
Chapters 12 and 13 could be omitted; Chapter 12 is largely a reference manual in any event, and
could be left to the students to study for themselves as the need arose. Similarly, Chapter 3 falls into
the category of background reading.

At our university we have also used an extended version of the Clang compiler as developed in the
text (one incorporating several of the extensions suggested as exercises) as a system for students to
study concurrent programming per se, and although it is a little limited, it is more than adequate for
the purpose. We have also used a slightly extended version of the assembler program very
successfully as our primary tool for introducing students to the craft of programming at the
assembler level.

Limitations

It is, perhaps, worth a slight digression to point out some things which the book does not claim to
be, and to justify some of the decisions made in the selection of material.

In the first place, while it is hoped that it will serve as a useful foundation for students who are
already considerably more advanced, a primary aim has been to make the material as accessible as
possible to students with a fairly limited background, to enhance the background, and to make them
somewhat more critical of it. In many cases this background is still Pascal based; increasingly it is
tending to become C++ based. Both of these languages have become rather large and complex, and
I have found that many students have a very superficial idea of how they really fit together. After a
course such as this one, many of the pieces of the language jigsaw fit together rather better.

When introducing the use of compiler writing tools, one might follow the many authors who
espouse the classic lex/yacc approach. However, there are now a number of excellent LL(1) based
tools, and these have the advantage that the code which is produced is close to that which might be
hand-crafted; at the same time, recursive descent parsing, besides being fairly intuitive, is powerful
enough to handle very usable languages.

That the languages used in case studies and their translators are relative toys cannot be denied. The
Clang language of later chapters, for example, supports only integer variables and simple
one-dimensional arrays of these, and has concurrent features allowing little beyond the simulation
of some simple textbook examples. The text is not intended to be a comprehensive treatise on
systems programming in general, just on certain selected topics in that area, and so very little is said
about native machine code generation and optimization, linkers and loaders, the interaction and
relationship with an operating system, and so on. These decisions were all taken deliberately, to
keep the material readily understandable and as machine-independent as possible. The systems may
be toys, but they are very usable toys! Of course the book is then open to the criticism that many of
the more difficult topics in translation (such as code generation and optimization) are effectively
not covered at all, and that the student may be deluded into thinking that these areas do not exist.
This is not entirely true; the careful reader will find most of these topics mentioned somewhere.

Good teachers will always want to put something of their own into a course, regardless of the
quality of the prescribed textbook. I have found that a useful (though at times highly dangerous)
technique is deliberately not to give the best solutions to a problem in a class discussion, with the
optimistic aim that students can be persuaded to "discover" them for themselves, and even gain a
sense of achievement in so doing. When applied to a book the technique is particularly dangerous,
but I have tried to exploit it on several occasions, even though it may give the impression that the
author is ignorant.

Another dangerous strategy is to give too much away, especially in a book like this aimed at
courses where, so far as I am aware, the traditional approach requires that students make far more
of the design decisions for themselves than my approach seems to allow them. Many of the books
in the field do not show enough of how something is actually done: the bridge between what they
give and what the student is required to produce is in excess of what is reasonable for a course
which is only part of a general curriculum. I have tried to compensate by suggesting what I hope is
a very wide range of searching exercises. The solutions to some of these are well known, and
available in the literature. Again, the decision to omit explicit references was deliberate (perhaps
dangerously so). Teachers often have to find some way of persuading the students to search the
literature for themselves, and this is not done by simply opening the journal at the right page for
them.

Acknowledgements
I am conscious of my gratitude to many people for their help and inspiration while this book has
been developed.

Like many others, I am grateful to Niklaus Wirth, whose programming languages and whose
writings on the subject of compiler construction and language design refute the modern trend
towards ever-increasing complexity in these areas, and serve as outstanding models of the way in
which progress should be made.

This project could not have been completed without the help of Hanspeter Mössenböck (author of
the original Coco/R compiler generator) and Francisco Arzu (who ported it to C++), who not only
commented on parts of the text, but also willingly gave permission for their software to be
distributed with the book. My thanks are similarly due to Richard Cichelli for granting permission
to distribute (with the software for Chapter 14) a program based on one he wrote for computing
minimal perfect hash functions, and to Christopher Cockburn for permission to include his
description of tonic sol-fa (used in Chapter 13).

I am grateful to Volker Pohlers for help with the port of Coco/R to Turbo Pascal, and to Dave
Gillespie for developing p2c, a most useful program for converting Modula-2 and Pascal code to
C/C++.

I am deeply indebted to my colleagues Peter Clayton, George Wells and Peter Wentworth for many
hours of discussion and fruitful suggestions. John Washbrook carefully reviewed the manuscript,
and made many useful suggestions for its improvement. Shaun Bangay patiently provided
incomparable technical support in the installation and maintenance of my hardware and software,
and rescued me from more than one disaster when things went wrong. To Rhodes University I am
indebted for the use of computer facilities, and for granting me leave to complete the writing of the
book. And, of course, several generations of students have contributed in intangible ways by their
reaction to my courses.

The development of the software in this book relied heavily on the use of electronic mail, and I am
grateful to Randy Bush, compiler writer and network guru extraordinaire, for his friendship, and for
his help in making the Internet a reality in developing countries in Africa and elsewhere.
But, as always, the greatest debt is owed to my wife Sally and my children David and Helen, for
their love and support through the many hours when they must have wondered where my priorities
lay.

Pat Terry
Rhodes University
Grahamstown

Trademarks
Ada is a trademark of the US Department of Defense.
Apple II is a trademark of Apple Corporation.
Borland C++, Turbo C++, TurboPascal and Delphi are trademarks of Borland
International Corporation.
GNU C Compiler is a trademark of the Free Software Foundation.
IBM and IBM PC are trademarks of International Business Machines Corporation.
Intel is a registered trademark of Intel Corporation.
MC68000 and MC68020 are trademarks of Motorola Corporation.
MIPS is a trademark of MIPS computer systems.
Microsoft, MS and MS-DOS are registered trademarks and Windows is a trademark of
Microsoft Corporation.
SPARC is a trademark of Sun Microsystems.
Stony Brook Software and QuickMod are trademarks of Gogesch Micro Systems, Inc.
occam and Transputer are trademarks of Inmos.
UCSD Pascal and UCSD p-System are trademarks of the Regents of the University of
California.
UNIX is a registered trademark of AT&T Bell Laboratories.
Z80 is a trademark of Zilog Corporation.
COMPILERS AND COMPILER
GENERATORS
an introduction with C++
© P.D. Terry, Rhodes University, 1996

e-mail p.terry@ru.ac.za

The Postscript ® edition of this book was derived from the on-line versions available at
http://www.scifac.ru.ac.za/compilers/, a WWW site that is occasionally updated, and which
contains the latest versions of the various editions of the book, with details of how to download
compressed versions of the text and its supporting software and courseware.

The original edition of this book, published originally by International Thomson, is now out of
print, but has a home page at http://cs.ru.ac.za/homes/cspt/compbook.htm. In preparing the on-line
edition, the opportunity was taken to correct the few typographical mistakes that crept into the first
printing, and to create a few hyperlinks to where the source files can be found.

Feel free to read and use this book for study or teaching, but please respect my copyright and do not
distribute it further without my consent. If you do make use of it I would appreciate hearing from
you.

CONTENTS
Preface

Acknowledgements

1 Introduction
1.1 Objectives
1.2 Systems programs and translators
1.3 The relationship between high-level languages and translators

2 Translator classification and structure

2.1 T-diagrams
2.2 Classes of translator
2.3 Phases in translation
2.4 Multi-stage translators
2.5 Interpreters, interpretive compilers, and emulators

3 Compiler construction and bootstrapping

3.1 Using a high-level host language
3.2 Porting a high-level translator
3.3 Bootstrapping
3.4 Self-compiling compilers
3.5 The half bootstrap
3.6 Bootstrapping from a portable interpretive compiler
3.7 A P-code assembler

4 Machine emulation
4.1 Simple machine architecture
4.2 Addressing modes
4.3 Case study 1 - a single-accumulator machine
4.4 Case study 2 - a stack-oriented computer

5 Language specification
5.1 Syntax, semantics, and pragmatics
5.2 Languages, symbols, alphabets and strings
5.3 Regular expressions
5.4 Grammars and productions
5.5 Classic BNF notation for productions
5.6 Simple examples
5.7 Phrase structure and lexical structure
5.8 -productions
5.9 Extensions to BNF
5.10 Syntax diagrams
5.11 Formal treatment of semantics

6 Simple assemblers
6.1 A simple ASSEMBLER language
6.2 One- and two-pass assemblers, and symbol tables
6.3 Towards the construction of an assembler
6.4 Two-pass assembly
6.5 One-pass assembly

7 Advanced assembler features

7.1 Error detection
7.2 Simple expressions as addresses
7.3 Improved symbol table handling - hash tables
7.4 Macro-processing facilities
7.5 Conditional assembly
7.6 Relocatable code
7.7 Further projects

8 Grammars and their classification

8.1 Equivalent grammars
8.2 Case study - equivalent grammars for describing expressions
8.3 Some simple restrictions on grammars
8.4 Ambiguous grammars
8.5 Context sensitivity
8.6 The Chomsky hierarchy
8.7 Case study - Clang

9 Deterministic top-down parsing

9.1 Deterministic top-down parsing
9.2 Restrictions on grammars so as to allow LL(1) parsing
9.3 The effect of the LL(1) conditions on language design

10 Parser and scanner construction

10.1 Construction of simple recursive descent parsers
10.2 Case studies
10.3 Syntax error detection and recovery
10.4 Construction of simple scanners
10.5 Case studies
10.6 LR parsing
10.7 Automated construction of scanners and parsers

11 Syntax-directed translation
11.1 Embedding semantic actions into syntax rules
11.2 Attribute grammars
11.3 Synthesized and inherited attributes
11.4 Classes of attribute grammars
11.5 Case study - a small student database

12 Using Coco/R - overview

12.1 Installing and running Coco/R
12.2 Case study - a simple adding machine
12.3 Scanner specification
12.4 Parser specification
12.5 The driver program

13 Using Coco/R - Case studies

13.1 Case study - Understanding C declarations
13.2 Case study - Generating one-address code from expressions
13.3 Case study - Generating one-address code from an AST
13.4 Case study - How do parser generators work?
13.5 Project suggestions

14 A simple compiler - the front end

14.1 Overall compiler structure
14.2 Source handling
14.3 Error reporting
14.4 Lexical analysis
14.5 Syntax analysis
14.6 Error handling and constraint analysis
14.7 The symbol table handler
14.8 Other aspects of symbol table management - further types

15 A simple compiler - the back end

15.1 The code generation interface
15.2 Code generation for a simple stack machine
15.3 Other aspects of code generation

16 Simple block structure

16.1 Parameterless procedures
16.2 Storage management

17 Parameters and functions

17.1 Syntax and semantics
17.2 Symbol table support for context sensitive features
17.3 Actual parameters and stack frames
17.4 Hypothetical stack machine support for parameter passing
17.5 Context sensitivity and LL(1) conflict resolution
17.6 Semantic analysis and code generation
17.7 Language design issues

18 Concurrent programming
18.1 Fundamental concepts
18.2 Parallel processes, exclusion and synchronization
18.3 A semaphore-based system - syntax, semantics, and code generation
18.4 Run-time implementation

Appendix A: Software resources for this book

Appendix B: Source code for the Clang compiler/interpreter

Appendix C: Cocol grammar for the Clang compiler/interpreter

Appendix D: Source code for a macro assembler

Bibliography

Index
Compilers and Compiler Generators © P.D. Terry, 2000

1 INTRODUCTION

1.1 Objectives
The use of computer languages is an essential link in the chain between human and computer. In
this text we hope to make the reader more aware of some aspects of

Imperative programming languages - their syntactic and semantic features; the ways of
specifying syntax and semantics; problem areas and ambiguities; the power and usefulness of
various features of a language.

Translators for programming languages - the various classes of translator (assemblers,

compilers, interpreters); implementation of translators.

Compiler generators - tools that are available to help automate the construction of translators
for programming languages.

This book is a complete revision of an earlier one published by Addison-Wesley (Terry, 1986). It
has been written so as not to be too theoretical, but to relate easily to languages which the reader
already knows or can readily understand, like Pascal, Modula-2, C or C++. The reader is expected
to have a good background in one of those languages, access to a good implementation of it, and,
preferably, some background in assembly language programming and simple machine architecture.
We shall rely quite heavily on this background, especially on the understanding the reader should
have of the meaning of various programming constructs.

Significant parts of the text concern themselves with case studies of actual translators for simple
languages. Other important parts of the text are to be found in the many exercises and suggestions
for further study and experimentation on the part of the reader. In short, the emphasis is on "doing"
rather than just "reading", and the reader who does not attempt the exercises will miss many, if not
most, of the finer points.

The primary language used in the implementation of our case studies is C++ (Stroustrup, 1990).
Machine readable source code for all these case studies is to be found on the IBM-PC compatible
diskette that is included with the book. As well as C++ versions of this code, we have provided
equivalent source in Modula-2 and Turbo Pascal, two other languages that are eminently suitable
for use in a course of this nature. Indeed, for clarity, some of the discussion is presented in a
pseudo-code that often resembles Modula-2 rather more than it does C++. It is only fair to warn the
reader that the code extracts in the book are often just that - extracts - and that there are many
instances where identifiers are used whose meaning may not be immediately apparent from their
local context. The conscientious reader will have to expend some effort in browsing the code.
Complete source for an assembler and interpreter appears in the appendices, but the discussion
often revolves around simplified versions of these programs that are found in their entirety only on
the diskette.
1.2 Systems programs and translators
Users of modern computing systems can be divided into two broad categories. There are those who
never develop their own programs, but simply use ones developed by others. Then there are those
who are concerned as much with the development of programs as with their subsequent use. This
latter group - of whom we as computer scientists form a part - is fortunate in that program
development is usually aided by the use of high-level languages for expressing algorithms, the use
of interactive editors for program entry and modification, and the use of sophisticated job control
languages or graphical user interfaces for control of execution. Programmers armed with such tools
have a very different picture of computer systems from those who are presented with the hardware
alone, since the use of compilers, editors and operating systems - a class of tools known generally
as systems programs - removes from humans the burden of developing their systems at the
machine level. That is not to claim that the use of such tools removes all burdens, or all possibilities
for error, as the reader will be well aware.

Well within living memory, much program development was done in machine language - indeed,
some of it, of necessity, still is - and perhaps some readers have even tried this for themselves when
experimenting with microprocessors. Just a brief exposure to programs written as almost
meaningless collections of binary or hexadecimal digits is usually enough to make one grateful for
the presence of high-level languages, clumsy and irritating though some of their features may be.

However, in order for high-level languages to be usable, one must be able to convert programs
written in them into the binary or hexadecimal digits and bitstrings that a machine will understand.
At an early stage it was realized that if constraints were put on the syntax of a high-level language
the translation process became one that could be automated. This led to the development of
translators or compilers - programs which accept (as data) a textual representation of an algorithm
expressed in a source language, and which produce (as primary output) a representation of the
same algorithm expressed in another language, the object or target language.

Beginners often fail to distinguish between the compilation (compile-time) and execution (run-time)
phases in developing and using programs written in high-level languages. This is an easy trap to fall
into, since the translation (compilation) is often hidden from sight, or invoked with a special
function key from within an integrated development environment that may possess many other
magic function keys. Furthermore, beginners are often taught programming with this distinction
deliberately blurred, their teachers offering explanations such as "when a computer executes a read
statement it reads a number from the input data into a variable". This hides several low-level
operations from the beginner. The underlying implications of file handling, character conversion,
and storage allocation are glibly ignored - as indeed is the necessity for the computer to be
programmed to understand the word read in the first place. Anyone who has attempted to program
input/output (I/O) operations directly in assembler languages will know that many of them are
non-trivial to implement.

A translator, being a program in its own right, must itself be written in a computer language, known
as its host or implementation language. Today it is rare to find translators that have been
developed from scratch in machine language. Clearly the first translators had to be written in this
way, and at the outset of translator development for any new system one has to come to terms with
the machine language and machine architecture for that system. Even so, translators for new
machines are now invariably developed in high-level languages, often using the techniques of
cross-compilation and bootstrapping that will be discussed in more detail later.

The first major translators written may well have been the Fortran compilers developed by Backus
and his colleagues at IBM in the 1950’s, although machine code development aids were in
existence by then. The first Fortran compiler is estimated to have taken about 18 person-years of
effort. It is interesting to note that one of the primary concerns of the team was to develop a system
that could produce object code whose efficiency of execution would compare favourably with that
which expert human machine coders could achieve. An automatic translation process can rarely
produce code as optimal as can be written by a really skilled user of machine language, and to this
day important components of systems are often developed at (or very near to) machine level, in the
interests of saving time or space.

Translator programs themselves are never completely portable (although parts of them may be), and
they usually depend to some extent on other systems programs that the user has at his or her
disposal. In particular, input/output and file management on modern computer systems are usually
controlled by the operating system. This is a program or suite of programs and routines whose job
it is to control the execution of other programs so as best to share resources such as printers,
plotters, disk files and tapes, often making use of sophisticated techniques such as parallel
processing, multiprogramming and so on. For many years the development of operating systems
required the use of programming languages that remained closer to the machine code level than did
languages suitable for scientific or commercial programming. More recently a number of successful
higher level languages have been developed with the express purpose of catering for the design of
operating systems and real-time control. The most obvious example of such a language is C,
developed originally for the implementation of the UNIX operating system, and now widely used in
all areas of computing.

1.3 The relationship between high-level languages and translators

The reader will rapidly become aware that the design and implementation of translators is a subject
that may be developed from many possible angles and approaches. The same is true for the design
of programming languages.

Computer languages are generally classed as being "high-level" (like Pascal, Fortran, Ada,
Modula-2, Oberon, C or C++) or "low-level" (like ASSEMBLER). High-level languages may
further be classified as "imperative" (like all of those just mentioned), or "functional" (like Lisp,
Scheme, ML, or Haskell), or "logic" (like Prolog).

High-level languages are claimed to possess several advantages over low-level ones:

Readability: A good high-level language will allow programs to be written that in some ways
resemble a quasi-English description of the underlying algorithms. If care is taken, the coding
may be done in a way that is essentially self-documenting, a highly desirable property when
one considers that many programs are written once, but possibly studied by humans many
times thereafter.

Portability: High-level languages, being essentially machine independent, hold out the
promise of being used to develop portable software. This is software that can, in principle
(and even occasionally in practice), run unchanged on a variety of different machines -
provided only that the source code is recompiled as it moves from machine to machine.

To achieve machine independence, high-level languages may deny access to low-level

features, and are sometimes spurned by programmers who have to develop low-level machine
dependent systems. However, some languages, like C and Modula-2, were specifically
designed to allow access to these features from within the context of high-level constructs.
Structure and object orientation: There is general agreement that the structured programming
movement of the 1960’s and the object-oriented movement of the 1990’s have resulted in a
great improvement in the quality and reliability of code. High-level languages can be
designed so as to encourage or even subtly enforce these programming paradigms.

Generality: Most high-level languages allow the writing of a wide variety of programs, thus
relieving the programmer of the need to become expert in many diverse languages.

Brevity: Programs expressed in high-level languages are often considerably shorter (in terms
of their number of source lines) than their low-level equivalents.

Error checking: Being human, a programmer is likely to make many mistakes in the
development of a computer program. Many high-level languages - or at least their
implementations - can, and often do, enforce a great deal of error checking both at
compile-time and at run-time. For this they are, of course, often criticized by programmers
who have to develop time-critical code, or who want their programs to abort as quickly as
possible.

These advantages sometimes appear to be over-rated, or at any rate, hard to reconcile with reality.
For example, readability is usually within the confines of a rather stilted style, and some beginners
are disillusioned when they find just how unnatural a high-level language is. Similarly, the
generality of many languages is confined to relatively narrow areas, and programmers are often
dismayed when they find areas (like string handling in standard Pascal) which seem to be very
poorly handled. The explanation is often to be found in the close coupling between the development
of high-level languages and of their translators. When one examines successful languages, one finds
numerous examples of compromise, dictated largely by the need to accommodate language ideas to
rather uncompromising, if not unsuitable, machine architectures. To a lesser extent, compromise is
also dictated by the quirks of the interface to established operating systems on machines. Finally,
some appealing language features turn out to be either impossibly difficult to implement, or too
expensive to justify in terms of the machine resources needed. It may not immediately be apparent
that the design of Pascal (and of several of its successors such as Modula-2 and Oberon) was
governed partly by a desire to make it easy to compile. It is a tribute to its designer that, in spite of
the limitations which this desire naturally introduced, Pascal became so popular, the model for so
many other languages and extensions, and encouraged the development of superfast compilers such
as are found in Borland’s Turbo Pascal and Delphi systems.

The design of a programming language requires a high degree of skill and judgement. There is
evidence to show that one’s language is not only useful for expressing one’s ideas. Because
language is also used to formulate and develop ideas, one’s knowledge of language largely
determines how and, indeed, what one can think. In the case of programming languages, there has
been much controversy over this. For example, in languages like Fortran - for long the lingua
franca of the scientific computing community - recursive algorithms were "difficult" to use (not
impossible, just difficult!), with the result that many programmers brought up on Fortran found
recursion strange and difficult, even something to be avoided at all costs. It is true that recursive
algorithms are sometimes "inefficient", and that compilers for languages which allow recursion
may exacerbate this; on the other hand it is also true that some algorithms are more simply
explained in a recursive way than in one which depends on explicit repetition (the best examples
probably being those associated with tree manipulation).

There are two divergent schools of thought as to how programming languages should be designed.
The one, typified by the Wirth school, stresses that languages should be small and understandable,
and that much time should be spent in consideration of what tempting features might be omitted
without crippling the language as a vehicle for system development. The other, beloved of
languages designed by committees with the desire to please everyone, packs a language full of
every conceivable potentially useful feature. Both schools claim success. The Wirth school has
given us Pascal, Modula-2 and Oberon, all of which have had an enormous effect on the thinking of
computer scientists. The other approach has given us Ada, C and C++, which are far more difficult
to master well and extremely complicated to implement correctly, but which claim spectacular
successes in the marketplace.

Other aspects of language design that contribute to success include the following:

Orthogonality: Good languages tend to have a small number of well thought out features that
can be combined in a logical way to supply more powerful building blocks. Ideally these
features should not interfere with one another, and should not be hedged about by a host of
inconsistencies, exceptional cases and arbitrary restrictions. Most languages have blemishes -
for example, in Wirth’s original Pascal a function could only return a scalar value, not one of
any structured type. Many potentially attractive extensions to well-established languages
prove to be extremely vulnerable to unfortunate oversights in this regard.

Familiar notation: Most computers are "binary" in nature. Blessed with ten toes on which to
check out their number-crunching programs, humans may be somewhat relieved that
high-level languages usually make decimal arithmetic the rule, rather than the exception, and
provide for mathematical operations in a notation consistent with standard mathematics.
When new languages are proposed, these often take the form of derivatives or dialects of
well-established ones, so that programmers can be tempted to migrate to the new language
and still feel largely at home - this was the route taken in developing C++ from C, Java from
C++, and Oberon from Modula-2, for example.

Besides meeting the ones mentioned above, a successful modern high-level language will have
been designed to meet the following additional criteria:

Clearly defined: It must be clearly described, for the benefit of both the user and the compiler
writer.

Quickly translated: It should admit quick translation, so that program development time when
using the language is not excessive.

Modularity: It is desirable that programs can be developed in the language as a collection of

separately compiled modules, with appropriate mechanisms for ensuring self-consistency
between these modules.

Efficient: It should permit the generation of efficient object code.

Widely available: It should be possible to provide translators for all the major machines and
for all the major operating systems.

The importance of a clear language description or specification cannot be over-emphasized. This

must apply, firstly, to the so-called syntax of the language - that is, it must specify accurately what
form a source program may assume. It must apply, secondly, to the so-called static semantics of
the language - for example, it must be clear what constraints must be placed on the use of entities of
differing types, or the scope that various identifiers have across the program text. Finally, the
specification must also apply to the dynamic semantics of programs that satisfy the syntactic and
static semantic rules - that is, it must be capable of predicting the effect any program expressed in
that language will have when it is executed.

Programming language description is extremely difficult to do accurately, especially if it is

attempted through the medium of potentially confusing languages like English. There is an
increasing trend towards the use of formalism for this purpose, some of which will be illustrated in
later chapters. Formal methods have the advantage of precision, since they make use of the clearly
defined notations of mathematics. To offset this, they may be somewhat daunting to programmers
weak in mathematics, and do not necessarily have the advantage of being very concise - for
example, the informal description of Modula-2 (albeit slightly ambiguous in places) took only some
35 pages (Wirth, 1985), while a formal description prepared by an ISO committee runs to over 700
pages.

Formal specifications have the added advantage that, in principle, and to a growing degree in
practice, they may be used to help automate the implementation of translators for the language.
Indeed, it is increasingly rare to find modern compilers that have been implemented without the
help of so-called compiler generators. These are programs that take a formal description of the
syntax and semantics of a programming language as input, and produce major parts of a compiler
for that language as output. We shall illustrate the use of compiler generators at appropriate points
in our discussion, although we shall also show how compilers may be crafted by hand.

Exercises

1.1 Make a list of as many translators as you can think of that can be found on your computer
system.

1.2 Make a list of as many other systems programs (and their functions) as you can think of that can
be found on your computer system.

1.3 Make a list of existing features in your favourite (or least favourite) programming language that
you find irksome. Make a similar list of features that you would like to have seen added. Then
examine your lists and consider which of the features are probably related to the difficulty of
implementation.

2 TRANSLATOR CLASSIFICATION AND STRUCTURE

In this chapter we provide the reader with an overview of the inner structure of translators, and
some idea of how they are classified.

A translator may formally be defined as a function, whose domain is a source language, and whose
range is contained in an object or target language.

A little experience with translators will reveal that it is rarely considered part of the translator’s
function to execute the algorithm expressed by the source, merely to change its representation from
one form to another. In fact, at least three languages are involved in the development of translators:
the source language to be translated, the object or target language to be generated, and the host
language to be used for implementing the translator. If the translation takes place in several stages,
there may even be other, intermediate, languages. Most of these - and, indeed, the host language
and object languages themselves - usually remain hidden from a user of the source language.

2.1 T-diagrams
A useful notation for describing a computer program, particularly a translator, uses so-called
T-diagrams, examples of which are shown in Figure 2.1.

We shall use the notation "M-code" to stand for "machine code" in these diagrams. Translation
itself is represented by standing the T on a machine, and placing the source program and object
program on the left and right arms, as depicted in Figure 2.2.
We can also regard this particular combination as depicting an abstract machine (sometimes called
a virtual machine), whose aim in life is to convert Turbo Pascal source programs into their 8086
machine code equivalents.

T-diagrams were first introduced by Bratman (1961). They were further refined by Earley and
Sturgis (1970), and are also used in the books by Bennett (1990), Watt (1993), and Aho, Sethi and
Ullman (1986).

2.2 Classes of translator

It is common to distinguish between several well-established classes of translator:

The term assembler is usually associated with those translators that map low-level language
instructions into machine code which can then be executed directly. Individual source
language statements usually map one-for-one to machine-level instructions.

The term macro-assembler is also associated with those translators that map low-level
language instructions into machine code, and is a variation on the above. Most source
language statements map one- for-one into their target language equivalents, but some macro
statements map into a sequence of machine- level instructions - effectively providing a text
replacement facility, and thereby extending the assembly language to suit the user. (This is
not to be confused with the use of procedures or other subprograms to "extend" high-level
languages, because the method of implementation is usually very different.)

The term compiler is usually associated with those translators that map high-level language
instructions into machine code which can then be executed directly. Individual source
language statements usually map into many machine-level instructions.

The term pre-processor is usually associated with those translators that map a superset of a
high-level language into the original high-level language, or that perform simple text
substitutions before translation takes place. The best-known pre-processor is probably that
which forms an integral part of implementations of the language C, and which provides many
of the features that contribute to the widely- held perception that C is the only really portable
language.

The term high-level translator is often associated with those translators that map one
high-level language into another high-level language - usually one for which sophisticated
compilers already exist on a range of machines. Such translators are particularly useful as
components of a two-stage compiling system, or in assisting with the bootstrapping
techniques to be discussed shortly.
The terms decompiler and disassembler refer to translators which attempt to take object
code at a low level and regenerate source code at a higher level. While this can be done quite
successfully for the production of assembler level code, it is much more difficult when one
tries to recreate source code originally written in, say, Pascal.

Many translators generate code for their host machines. These are called self-resident translators.
Others, known as cross-translators, generate code for machines other than the host machine.
Cross-translators are often used in connection with microcomputers, especially in embedded
systems, which may themselves be too small to allow self-resident translators to operate
satisfactorily. Of course, cross-translation introduces additional problems in connection with
transferring the object code from the donor machine to the machine that is to execute the translated
program, and can lead to delays and frustration in program development.

The output of some translators is absolute machine code, left loaded at fixed locations in a machine
ready for immediate execution. Other translators, known as load-and-go translators, may even
initiate execution of this code. However, a great many translators do not produce fixed-address
machine code. Rather, they produce something closely akin to it, known as semicompiled or
binary symbolic or relocatable form. A frequent use for this is in the development of composite
libraries of special purpose routines, possibly originating from a mixture of source languages.
Routines compiled in this way are linked together by programs called linkage editors or linkers,
which may be regarded almost as providing the final stage for a multi-stage translator. Languages
that encourage the separate compilation of parts of a program - like Modula-2 and C++ - depend
critically on the existence of such linkers, as the reader is doubtless aware. For developing really
large software projects such systems are invaluable, although for the sort of "throw away" programs
on which most students cut their teeth, they can initially appear to be a nuisance, because of the
overheads of managing several files, and of the time taken to link their contents together.

T-diagrams can be combined to show the interdependence of translators, loaders and so on. For
example, the FST Modula-2 system makes use of a compiler and linker as shown in Figure 2.3.

Exercises

2.1 Make a list of as many translators as you can think of that can be found on your system.

2.2 Which of the translators known to you are of the load-and-go type?

2.3 Do you know whether any of the translators you use produce relocatable code? Is this of a
standard form? Do you know the names of the linkage editors or loaders used on your system?
2.4 Are there any pre-processors on your system? What are they used for?

2.3 Phases in translation

Translators are highly complex programs, and it is unreasonable to consider the translation process
as occurring in a single step. It is usual to regard it as divided into a series of phases. The simplest
breakdown recognizes that there is an analytic phase, in which the source program is analysed to
determine whether it meets the syntactic and static semantic constraints imposed by the language.
This is followed by a synthetic phase in which the corresponding object code is generated in the
target language. The components of the translator that handle these two major phases are said to
comprise the front end and the back end of the compiler. The front end is largely independent of
the target machine, the back end depends very heavily on the target machine. Within this structure
we can recognize smaller components or phases, as shown in Figure 2.4.

The character handler is the section that communicates with the outside world, through the
operating system, to read in the characters that make up the source text. As character sets and file
handling vary from system to system, this phase is often machine or operating system dependent.

The lexical analyser or scanner is the section that fuses characters of the source text into groups
that logically make up the tokens of the language - symbols like identifiers, strings, numeric
constants, keywords like while and if, operators like <=, and so on. Some of these symbols are
very simply represented on the output from the scanner, some need to be associated with various
properties such as their names or values.

Lexical analysis is sometimes easy, and at other times not. For example, the Modula-2 statement
WHILE A > 3 * B DO A := A - 1 END

easily decodes into tokens

WHILE keyword
A identifier name A
> operator comparison
3 constant literal value 3
* operator multiplication
B identifier name B
DO keyword
A identifier name A
:= operator assignment
A identifier name A
- operator subtraction
1 constant literal value 1
END keyword

as we read it from left to right, but the Fortran statement

10 DO 20 I = 1 . 30

is more deceptive. Readers familiar with Fortran might see it as decoding into
10 label
DO keyword
20 statement label
I INTEGER identifier
= assignment operator
1 INTEGER constant literal
, separator
30 INTEGER constant literal

while those who enjoy perversity might like to see it as it really is:
10 label
DO20I REAL identifier
= assignment operator
1.30 REAL constant literal

One has to look quite hard to distinguish the period from the "expected" comma. (Spaces are
irrelevant in Fortran; one would, of course be perverse to use identifiers with unnecessary and
highly suggestive spaces in them.) While languages like Pascal, Modula-2 and C++ have been
cleverly designed so that lexical analysis can be clearly separated from the rest of the analysis, the
same is obviously not true of Fortran and other languages that do not have reserved keywords.

The syntax analyser or parser groups the tokens produced by the scanner into syntactic structures
- which it does by parsing expressions and statements. (This is analogous to a human analysing a
sentence to find components like "subject", "object" and "dependent clauses"). Often the parser is
combined with the contextual constraint analyser, whose job it is to determine that the
components of the syntactic structures satisfy such things as scope rules and type rules within the
context of the structure being analysed. For example, in Modula-2 the syntax of a while statement is
sometimes described as
WHILE Expression DO StatementSequence END

It is reasonable to think of a statement in the above form with any type of Expression as being
syntactically correct, but as being devoid of real meaning unless the value of the Expression is
constrained (in this context) to be of the Boolean type. No program really has any meaning until it
is executed dynamically. However, it is possible with strongly typed languages to predict at
compile-time that some source programs can have no sensible meaning (that is, statically, before an
attempt is made to execute the program dynamically). Semantics is a term used to describe
"meaning", and so the constraint analyser is often called the static semantic analyser, or simply
the semantic analyser.

The output of the syntax analyser and semantic analyser phases is sometimes expressed in the form
of a decorated abstract syntax tree (AST). This is a very useful representation, as it can be used in
clever ways to optimize code generation at a later stage.
Whereas the concrete syntax of many programming languages incorporates many keywords and
tokens, the abstract syntax is rather simpler, retaining only those components of the language
needed to capture the real content and (ultimately) meaning of the program. For example, whereas
the concrete syntax of a while statement requires the presence of WHILE, DO and END as shown
above, the essential components of the while statement are simply the (Boolean) Expression and the
statements comprising the StatementSequence.

Thus the Modula-2 statement

WHILE (1 < P) AND (P < 9) DO P := P + Q END

or its C++ equivalent

while (1 < P && P < 9) P = P + Q;

are both depicted by the common AST shown in Figure 2.5.

An abstract syntax tree on its own is devoid of some semantic detail; the semantic analyser has the
task of adding "type" and other contextual information to the various nodes (hence the term
"decorated" tree).

Sometimes, as for example in the case of most Pascal compilers, the construction of such a tree is
not explicit, but remains implicit in the recursive calls to procedures that perform the syntax and
semantic analysis.

Of course, it is also possible to construct concrete syntax trees. The Modula-2 form of the statement
WHILE (1 < P) AND (P < 9) DO P := P + Q END

could be depicted in full and tedious detail by the tree shown in Figure 2.6. The reader may have to
make reference to Modula-2 syntax diagrams and the knowledge of Modula-2 precedence rules to
understand why the tree looks so complicated.
The phases just discussed are all analytic in nature. The ones that follow are more synthetic. The
first of these might be an intermediate code generator, which, in practice, may also be integrated
with earlier phases, or omitted altogether in the case of some very simple translators. It uses the
data structures produced by the earlier phases to generate a form of code, perhaps in the form of
simple code skeletons or macros, or ASSEMBLER or even high-level code for processing by an
external assembler or separate compiler. The major difference between intermediate code and
actual machine code is that intermediate code need not specify in detail such things as the exact
machine registers to be used, the exact addresses to be referred to, and so on.

Our example statement

WHILE (1 < P) AND (P < 9) DO P := P + Q END

might produce intermediate code equivalent to

L0 if 1 < P goto L1
goto L3
L1 if P < 9 goto L2
goto L3
L2 P := P + Q
goto L0
L3 continue

Then again, it might produce something like

L0 T1 := 1 < P
T2 := P < 9
if T1 and T2 goto L1
goto L2
L1 P := P + Q
goto L0
L2 continue

depending on whether the implementors of the translator use the so-called sequential conjunction or
short-circuit approach to handling compound Boolean expressions (as in the first case) or the
so-called Boolean operator approach. The reader will recall that Modula-2 and C++ require the
short-circuit approach. However, the very similar language Pascal did not specify that one approach
be preferred above the other.

A code optimizer may optionally be provided, in an attempt to improve the intermediate code in
the interests of speed or space or both. To use the same example as before, obvious optimization
would lead to code equivalent to
L0 if 1 >= P goto L1
if P >= 9 goto L1
P := P + Q
goto L0
L1 continue

The most important phase in the back end is the responsibility of the code generator. In a real
compiler this phase takes the output from the previous phase and produces the object code, by
deciding on the memory locations for data, generating code to access such locations, selecting
registers for intermediate calculations and indexing, and so on. Clearly this is a phase which calls
for much skill and attention to detail, if the finished product is to be at all efficient. Some translators
go on to a further phase by incorporating a so-called peephole optimizer in which attempts are
made to reduce unnecessary operations still further by examining short sequences of generated code
in closer detail.

Below we list the actual code generated by various MS-DOS compilers for this statement. It is
readily apparent that the code generation phases in these compilers are markedly different. Such
differences can have a profound effect on program size and execution speed.
Borland C++ 3.1 (47 bytes) Turbo Pascal (46 bytes)
(with no short circuit evaluation)

CS:A0 BBB702 MOV BX,02B7 CS:09 833E3E0009 CMP WORD PTR[003E],9

CS:A3 C746FE5100 MOV WORD PTR[BP-2],0051 CS:0E 7C04 JL 14
CS:A8 EB07 JMP B1 CS:10 B000 MOV AL,0
CS:AA 8BC3 MOV AX,BX CS:12 EB02 JMP 16
CS:AC 0346FE ADD AX,[BP-2] CS:14 B001 MOV AL,1
CS:AF 8BD8 MOV BX,AX CS:16 8AD0 MOV DL,AL
CS:B1 83FB01 CMP BX,1 CS:18 833E3E0001 CMP WORD PTR[003E],1
CS:B4 7E05 JLE BB CS:1D 7F04 JG 23
CS:B6 B80100 MOV AX,1 CS:1F B000 MOV AL,0
CS:B9 EB02 JMP BD CS:21 EB02 JMP 25
CS:BB 33C0 XOR AX,AX CS:23 B001 MOV AL,01
CS:BD 50 PUSH AX CS:25 22C2 AND AL,DL
CS:BE 83FB09 CMP BX,9 CS:27 08C0 OR AL,AL
CS:C1 7D05 JGE C8 CS:29 740C JZ 37
CS:C3 B80100 MOV AX,1 CS:2B A13E00 MOV AX,[003E]
CS:C6 EB02 JMP CA CS:2E 03064000 ADD AX,[0040]
CS:C8 33C0 XOR AX,AX CS:32 A33E00 MOV [003E],AX
CS:CA 5A POP DX CS:35 EBD2 JMP 9
CS:CB 85D0 TEST DX,AX
CS:CD 75DB JNZ AA

JPI TopSpeed Modula-2 (29 bytes) Stony Brook QuickMod (24 bytes)

CS:19 2E CS: CS:69 BB2D00 MOV BX,2D

CS:1A 8E1E2700 MOV DS,[0027] CS:6C B90200 MOV CX,2
CS:1E 833E000001 CMP WORD PTR[0000],1 CS:6F E90200 JMP 74
CS:23 7E11 JLE 36 CS:72 01D9 ADD CX,BX
CS:25 833E000009 CMP WORD PTR[0000],9 CS:74 83F901 CMP CX,1
CS:2A 7D0A JGE 36 CS:77 7F03 JG 7C
CS:2C 8B0E0200 MOV CX,[0002] CS:79 E90500 JMP 81
CS:30 010E0000 ADD [0000],CX CS:7C 83F909 CMP CX,9
CS:34 EBE3 JMP 19 CS:7F 7CF1 JL 72

A translator inevitably makes use of a complex data structure, known as the symbol table, in which
it keeps track of the names used by the program, and associated properties for these, such as their
type, and their storage requirements (in the case of variables), or their values (in the case of
constants).
As is well known, users of high-level languages are apt to make many errors in the development of
even quite simple programs. Thus the various phases of a compiler, especially the earlier ones, also
communicate with an error handler and error reporter which are invoked when errors are
detected. It is desirable that compilation of erroneous programs be continued, if possible, so that the
user can clean several errors out of the source before recompiling. This raises very interesting
issues regarding the design of error recovery and error correction techniques. (We speak of error
recovery when the translation process attempts to carry on after detecting an error, and of error
correction or error repair when it attempts to correct the error from context - usually a contentious
subject, as the correction may be nothing like what the programmer originally had in mind.)

Error detection at compile-time in the source code must not be confused with error detection at
run-time when executing the object code. Many code generators are responsible for adding
error-checking code to the object program (to check that subscripts for arrays stay in bounds, for
example). This may be quite rudimentary, or it may involve adding considerable code and data
structures for use with sophisticated debugging systems. Such ancillary code can drastically reduce
the efficiency of a program, and some compilers allow it to be suppressed.

Sometimes mistakes in a program that are detected at compile-time are known as errors, and errors
that show up at run-time are known as exceptions, but there is no universally agreed terminology
for this.

Figure 2.4 seems to imply that compilers work serially, and that each phase communicates with the
next by means of a suitable intermediate language, but in practice the distinction between the
various phases often becomes a little blurred. Moreover, many compilers are actually constructed
around a central parser as the dominant component, with a structure rather more like the one in
Figure 2.7.

Exercises

2.5 What sort of problems can you foresee a Fortran compiler having in analysing statements
beginning
IF ( I(J) - I(K) ) ........
CALL IF (4 , ...........
IF (3 .EQ. MAX) GOTO ......
100 FORMAT(X3H)=(I5)

2.6 What sort of code would you have produced had you been coding a statement like "WHILE (1 <
P) AND (P < 9) DO P := P + Q END" into your favourite ASSEMBLER language?

2.7 Draw the concrete syntax tree for the C++ version of the while statement used for illustration in
this section.

2.8 Are there any reasons why short-circuit evaluation should be preferred over the Boolean
operator approach? Can you think of any algorithms that would depend critically on which
approach was adopted?

2.9 Write down a few other high-level constructs and try to imagine what sort of
ASSEMBLER-like machine code a compiler would produce for them.

2.10 What do you suppose makes it relatively easy to compile Pascal? Can you think of any aspects
of Pascal which could prove really difficult?

2.11 We have used two undefined terms which at first seem interchangeable, namely "separate" and
"independent" compilation. See if you can discover what the differences are.

2.12 Many development systems - in particular debuggers - allow a user to examine the object code
produced by a compiler. If you have access to one of these, try writing a few very simple (single
statement) programs, and look at the sort of object code that is generated for them.

2.4 Multi-stage translators

Besides being conceptually divided into phases, translators are often divided into passes, in each of
which several phases may be combined or interleaved. Traditionally, a pass reads the source
program, or output from a previous pass, makes some transformations, and then writes output to an
intermediate file, whence it may be rescanned on a subsequent pass.

These passes may be handled by different integrated parts of a single compiler, or they may be
handled by running two or more separate programs. They may communicate by using their own
specialized forms of intermediate language, they may communicate by making use of internal data
structures (rather than files), or they may make several passes over the same original source code.

The number of passes used depends on a variety of factors. Certain languages require at least two
passes to be made if code is to be generated easily - for example, those where declaration of
identifiers may occur after the first reference to the identifier, or where properties associated with
an identifier cannot be readily deduced from the context in which it first appears. A multi-pass
compiler can often save space. Although modern computers are usually blessed with far more
memory than their predecessors of only a few years back, multiple passes may be an important
consideration if one wishes to translate complicated languages within the confines of small systems.
Multi-pass compilers may also allow for better provision of code optimization, error reporting and
error handling. Lastly, they lend themselves to team development, with different members of the
team assuming responsibility for different passes. However, multi-pass compilers are usually
slower than single-pass ones, and their probable need to keep track of several files makes them
slightly awkward to write and to use. Compromises at the design stage often result in languages that
are well suited to single-pass compilation.

In practice, considerable use is made of two-stage translators in which the first stage is a high-level
translator that converts the source program into ASSEMBLER, or even into some other relatively
high-level language for which an efficient translator already exists. The compilation process would
then be depicted as in Figure 2.8 - our example shows a Modula-3 program being prepared for
execution on a machine that has a Modula-3 to C converter:

It is increasingly common to find compilers for high-level languages that have been implemented
using C, and which themselves produce C code as output. The success of these is based on the
premises that "all modern computers come equipped with a C compiler" and "source code written in
C is truly portable". Neither premise is, unfortunately, completely true. However, compilers written
in this way are as close to achieving the dream of themselves being portable as any that exist at the
present time. The way in which such compilers may be used is discussed further in Chapter 3.

Exercises

2.13 Try to find out which of the compilers you have used are single-pass, and which are
multi-pass, and for the latter, find out how many passes are involved. Which produce relocatable
code needing further processing by linkers or linkage editors?

2.14 Do any of the compilers in use on your system produce ASSEMBLER, C or other such code
during the compilation process? Can you foresee any particular problems that users might
experience in using such compilers?

2.15 One of several compilers that translates from Modula-2 to C is called mtc, and is freely
available from several ftp sites. If you are a Modula-2 programmer, obtain a copy, and experiment
with it.

2.16 An excellent compiler that translates Pascal to C is called p2c, and is widely available for Unix
systems from several ftp sites. If you are a Pascal programmer, obtain a copy, and experiment with
it.

2.17 Can you foresee any practical difficulties in using C as an intermediate language?

2.5 Interpreters, interpretive compilers, and emulators

Compilers of the sort that we have been discussing have a few properties that may not immediately
be apparent. Firstly, they usually aim to produce object code that can run at the full speed of the
target machine. Secondly, they are usually arranged to compile an entire section of code before any
of it can be executed.
In some interactive environments the need arises for systems that can execute part of an application
without preparing all of it, or ones that allow the user to vary his or her course of action on the fly.
Typical scenarios involve the use of spreadsheets, on-line databases, or batch files or shell scripts
for operating systems. With such systems it may be feasible (or even desirable) to exchange some
of the advantages of speed of execution for the advantage of procuring results on demand.

Systems like these are often constructed so as to make use of an interpreter. An interpreter is a
translator that effectively accepts a source program and executes it directly, without, seemingly,
producing any object code first. It does this by fetching the source program instructions one by one,
analysing them one by one, and then "executing" them one by one. Clearly, a scheme like this, if it
is to be successful, places some quite severe constraints on the nature of the source program.
Complex program structures such as nested procedures or compound statements do not lend
themselves easily to such treatment. On the other hand, one-line queries made of a data base, or
simple manipulations of a row or column of a spreadsheet, can be handled very effectively.

This idea is taken quite a lot further in the development of some translators for high-level
languages, known as interpretive compilers. Such translators produce (as output) intermediate
code which is intrinsically simple enough to satisfy the constraints imposed by a practical
interpreter, even though it may still be quite a long way from the machine code of the system on
which it is desired to execute the original program. Rather than continue translation to the level of
machine code, an alternative approach that may perform acceptably well is to use the intermediate
code as part of the input to a specially written interpreter. This in turn "executes" the original
algorithm, by simulating a virtual machine for which the intermediate code effectively is the
machine code. The distinction between the machine code and pseudo-code approaches to execution
is summarized in Figure 2.9.

We may depict the process used in an interpretive compiler running under MS-DOS for a toy
language like Clang, the one illustrated in later chapters, in T-diagram form (see Figure 2.10).

It is not necessary to confine interpreters merely to work with intermediate output from a translator.
More generally, of course, even a real machine can be viewed as a highly specialized interpreter -
one that executes the machine level instructions by fetching, analysing, and then interpreting them
one by one. In a real machine this all happens "in hardware", and hence very quickly. By carrying
on this train of thought, the reader should be able to see that a program could be written to allow
one real machine to emulate any other real machine, albeit perhaps slowly, simply by writing an
interpreter - or, as it is more usually called, an emulator - for the second machine.
For example, we might develop an emulator that runs on a Sun SPARC machine and makes it
appear to be an IBM PC (or the other way around). Once we have done this, we are (in principle) in
a position to execute any software developed for an IBM PC on the Sun SPARC machine -
effectively the PC software becomes portable!

The T-diagram notation is easily extended to handle the concept of such virtual machines. For
example, running Turbo Pascal on our Sun SPARC machine could be depicted by Figure 2.11.

The interpreter/emulator approach is widely used in the design and development both of new
machines themselves, and the software that is to run on those machines.

An interpretive approach may have several points in its favour:

It is far easier to generate hypothetical machine code (which can be tailored towards the
quirks of the original source language) than real machine code (which has to deal with the
uncompromising quirks of real machines).

A compiler written to produce (as output) well-defined pseudo-machine code capable of easy
interpretation on a range of machines can be made highly portable, especially if it is written in
a host language that is widely available (such as ANSI C), or even if it is made available
already implemented in its own pseudo- code.

It can more easily be made "user friendly" than can the native code approach. Since the
interpreter works closer to the source code than does a fully translated program, error
messages and other debugging aids may readily be related to this source.

A whole range of languages may quickly be implemented in a useful form on a wide range of
different machines relatively easily. This is done by producing intermediate code to a
well-defined standard, for which a relatively efficient interpreter should be easy to implement
on any particular real machine.

It proves to be useful in connection with cross-translators such as were mentioned earlier. The
code produced by such translators can sometimes be tested more effectively by simulated
execution on the donor machine, rather than after transfer to the target machine - the delays
inherent in the transfer from one machine to the other may be balanced by the degradation of
execution time in an interpretive simulation.

Lastly, intermediate languages are often very compact, allowing large programs to be
handled, even on relatively small machines. The success of the once very widely used UCSD
Pascal and UCSD p-System stands as an example of what can be done in this respect.
For all these advantages, interpretive systems carry fairly obvious overheads in execution speed,
because execution of intermediate code effectively carries with it the cost of virtual translation into
machine code each time a hypothetical machine instruction is obeyed.

One of the best known of the early portable interpretive compilers was the one developed at
Zürich and known as the "Pascal-P" compiler (Nori et al., 1981). This was supplied in a kit of three
components:

The first component was the source form of a Pascal compiler, written in a very complete
subset of the language, known as Pascal-P. The aim of this compiler was to translate Pascal-P
source programs into a well-defined and well-documented intermediate language, known as
P-code, which was the "machine code" for a hypothetical stack-based computer, known as the
P-machine.

The second component was a compiled version of the first - the P-codes that would be
produced by the Pascal-P compiler, were it to compile itself.

Lastly, the kit contained an interpreter for the P-code language, supplied as a Pascal
algorithm.

The interpreter served primarily as a model for writing a similar program for the target machine, to
allow it to emulate the hypothetical P-machine. As we shall see in a later chapter, emulators are
relatively easy to develop - even, if necessary, in ASSEMBLER - so that this stage was usually
fairly painlessly achieved. Once one had loaded the interpreter - that is to say, the version of it
tailored to a local real machine - into a real machine, one was in a position to "execute" P-code, and
in particular the P-code of the P-compiler. The compilation and execution of a user program could
then be achieved in a manner depicted in Figure 2.12.

Exercises

2.18 Try to find out which of the translators you have used are interpreters, rather than full
compilers.

2.19 If you have access to both a native-code compiler and an interpreter for a programming
language known to you, attempt to measure the loss in efficiency when the interpreter is used to run
a large program (perhaps one that does substantial number-crunching).
Compilers and Compiler Generators © P.D. Terry, 2000

3 COMPILER CONSTRUCTION AND BOOTSTRAPPING

By now the reader may have realized that developing translators is a decidedly non-trivial exercise.
If one is faced with the task of writing a full-blown translator for a fairly complex source language,
or an emulator for a new virtual machine, or an interpreter for a low-level intermediate language,
one would probably prefer not to implement it all in machine code.

Fortunately one rarely has to contemplate such a radical step. Translator systems are now widely
available and well understood. A fairly obvious strategy when a translator is required for an old
language on a new machine, or a new language on an old machine (or even a new language on a
new machine), is to make use of existing compilers on either machine, and to do the development in
a high level language. This chapter provides a few examples that should make this clearer.

3.1 Using a high-level host language

If, as is increasingly common, one’s dream machine M is supplied with the machine coded version
of a compiler for a well-established language like C, then the production of a compiler for one’s
dream language X is achievable by writing the new compiler, say XtoM, in C and compiling the
source (XtoM.C) with the C compiler (CtoM.M) running directly on M (see Figure 3.1). This
produces the object version (XtoM.M) which can then be executed on M.

Even though development in C is much easier than development in machine code, the process is
still complex. As was mentioned earlier, it may be possible to develop a large part of the compiler
source using compiler generator tools - assuming, of course, that these are already available either
in executable form, or as C source that can itself be compiled easily. The hardest part of the
development is probably that associated with the back end, since this is intensely machine
dependent. If one has access to the source code of a compiler like CtoM one may be able to use this
to good avail. Although commercial compilers are rarely released in source form, source code is
available for many compilers produced at academic institutions or as components of the GNU
project carried out under the auspices of the Free Software Foundation.

3.2 Porting a high level translator

The process of modifying an existing compiler to work on a new machine is often known as
porting the compiler. In some cases this process may be almost trivially easy. Consider, for
example, the fairly common scenario where a compiler XtoC for a popular language X has been
implemented in C on machine A by writing a high-level translator to convert programs written in X
to C, and where it is desired to use language X on a machine M that, like A, has already been
blessed with a C compiler of its own. To construct a two-stage compiler for use on either machine,
all one needs to do, in principle, is to install the source code for XtoC on machine M and recompile
it.

Such an operation is conveniently represented in terms of T-diagrams chained together. Figure

3.2(a) shows the compilation of the X to C compiler, and Figure 3.2(b) shows the two-stage
compilation process needed to compile programs written in X to M-code.

The portability of a compiler like XtoC.C is almost guaranteed, provided that it is itself written in
"portable" C. Unfortunately, or as Mr. Murphy would put it, "interchangeable parts don’t" (more
explicitly, "portable C isn’t"). Some time may have to be spent in modifying the source code of
XtoC.C before it is acceptable as input to CtoM.M, although it is to be hoped that the developers of
XtoC.C will have used only standard C in their work, and used pre-processor directives that allow
for easy adaptation to other systems.

If there is an initial strong motivation for making a compiler portable to other systems it is, indeed,
often written so as to produce high-level code as output. More often, of course, the original
implementation of a language is written as a self-resident translator with the aim of directly
producing machine code for the current host system.

3.3 Bootstrapping
All this may seem to be skirting around a really nasty issue - how might the first high-level
language have been implemented? In ASSEMBLER? But then how was the assembler for
ASSEMBLER produced?

A full assembler is itself a major piece of software, albeit rather simple when compared with a
compiler for a really high level language, as we shall see. It is, however, quite common to define
one language as a subset of another, so that subset 1 is contained in subset 2 which in turn is
contained in subset 3 and so on, that is:
One might first write an assembler for subset 1 of ASSEMBLER in machine code, perhaps on a
load-and-go basis (more likely one writes in ASSEMBLER, and then hand translates it into
machine code). This subset assembler program might, perhaps, do very little other than convert
mnemonic opcodes into binary form. One might then write an assembler for subset 2 of
ASSEMBLER in subset 1 of ASSEMBLER, and so on.

This process, by which a simple language is used to translate a more complicated program, which
in turn may handle an even more complicated program and so on, is known as bootstrapping, by
analogy with the idea that it might be possible to lift oneself off the ground by tugging at one’s
boot-straps.

3.4 Self-compiling compilers

Once one has a working system, one can start using it to improve itself. Many compilers for popular
languages were first written in another implementation language, as implied in section 3.1, and then
rewritten in their own source language. The rewrite gives source for a compiler that can then be
compiled with the compiler written in the original implementation language. This is illustrated in
Figure 3.3.

Clearly, writing a compiler by hand not once, but twice, is a non-trivial operation, unless the
original implementation language is close to the source language. This is not uncommon: Oberon
compilers could be implemented in Modula-2; Modula-2 compilers, in turn, were first implemented
in Pascal (all three are fairly similar), and C++ compilers were first implemented in C.

Developing a self-compiling compiler has four distinct points to recommend it. Firstly, it
constitutes a non-trivial test of the viability of the language being compiled. Secondly, once it has
been done, further development can be done without recourse to other translator systems. Thirdly,
any improvements that can be made to its back end manifest themselves both as improvements to
the object code it produces for general programs and as improvements to the compiler itself. Lastly,
it provides a fairly exhaustive self-consistency check, for if the compiler is used to compile its own
source code, it should, of course, be able to reproduce its own object code (see Figure 3.4).

Furthermore, given a working compiler for a high-level language it is then very easy to produce
compilers for specialized dialects of that language.
3.5 The half bootstrap
Compilers written to produce object code for a particular machine are not intrinsically portable.
However, they are often used to assist in a porting operation. For example, by the time that the first
Pascal compiler was required for ICL machines, the Pascal compiler available in Zürich (where
Pascal had first been implemented on CDC mainframes) existed in two forms (Figure 3.5).

The first stage of the transportation process involved changing PasToCDC.Pas to generate ICL
machine code - thus producing a cross compiler. Since PasToCDC.Pas had been written in a
high-level language, this was not too difficult to do, and resulted in the compiler PasToICL.Pas.

Of course this compiler could not yet run on any machine at all. It was first compiled using
PasToCDC.CDC, on the CDC machine (see Figure 3.6(a)). This gave a cross-compiler that could
run on CDC machines, but still not, of course, on ICL machines. One further compilation of
PasToICL.Pas, using the cross-compiler PasToICL.CDC on the CDC machine, produced the final
result, PasToICL.ICL (Figure 3.6(b)).
The final product (PasToICL.ICL) was then transported on magnetic tape to the ICL machine, and
loaded quite easily. Having obtained a working system, the ICL team could (and did) continue
development of the system in Pascal itself.

This porting operation was an example of what is known as a half bootstrap system. The work of
transportation is essentially done entirely on the donor machine, without the need for any translator
in the target machine, but a crucial part of the original compiler (the back end, or code generator)
has to be rewritten in the process. Clearly the method is hazardous - any flaws or oversights in
writing PasToICL.Pas could have spelled disaster. Such problems can be reduced by minimizing
changes made to the original compiler. Another technique is to write an emulator for the target
machine that runs on the donor machine, so that the final compiler can be tested on the donor
machine before being transferred to the target machine.

3.6 Bootstrapping from a portable interpretive compiler

Because of the inherent difficulty of the half bootstrap for porting compilers, a variation on the full
bootstrap method described above for assemblers has often been successfully used in the case of
Pascal and other similar high-level languages. Here most of the development takes place on the
target machine, after a lot of preliminary work has been done on the donor machine to produce an
interpretive compiler that is almost portable. It will be helpful to illustrate with the well-known
example of the Pascal-P implementation kit mentioned in section 2.5.

Users of this kit typically commenced operations by implementing an interpreter for the P-machine.
The bootstrap process was then initiated by developing a compiler (PasPtoM.PasP) to translate
Pascal-P source programs to the local machine code. This compiler could be written in Pascal-P
source, development being guided by the source of the Pascal-P to P-code compiler supplied as part
of the kit. This new compiler was then compiled with the interpretive compiler (PasPtoP.P) from
the kit (Figure 3.7(a)) and the source of the Pascal to M-code compiler was then compiled by this
new compiler, interpreted once again by the P-machine, to give the final product, PasPtoM.M
(Figure 3.7(b)).

The Zürich P-code interpretive compiler could be, and indeed was, used as a highly portable
development system. It was employed to remarkable effect in developing the UCSD Pascal system,
which was the first serious attempt to implement Pascal on microcomputers. The UCSD Pascal
team went on to provide the framework for an entire operating system, editors and other utilities -
all written in Pascal, and all compiled into a well-defined P-code object code. Simply by providing
an alternative interpreter one could move the whole system to a new microcomputer system
virtually unchanged.

3.7 A P-code assembler

There is, of course, yet another way in which a portable interpretive compiler kit might be used.
One might commence by writing a P-code to M-code assembler, probably a relatively simple task.
Once this has been produced one would have the assembler depicted in Figure 3.8.

The P-codes for the P-code compiler would then be assembled by this system to give another cross
compiler (Figure 3.9(a)), and the same P-code/M-code assembler could then be used as a back-end
to the cross compiler (Figure 3.9(b)).

Exercises

3.1 Draw the T-diagram representations for the development of a P-code to M-code assembler,
assuming that you have a C++ compiler available on the target system.

3.2 Later in this text we shall develop an interpretive compiler for a small language called Clang,
using C++ as the host language. Draw T-diagram representations of the various components of the
system as you foresee them.

4.1 Simple machine architecture

Many CPU (central processor unit) chips used in modern computers have one or more internal
registers or accumulators, which may be regarded as highly local memory where simple
arithmetic and logical operations may be performed, and between which local data transfers may
take place. These registers may be restricted to the capacity of a single byte (8 bits), or, as is typical
of most modern processors, they may come in a variety of small multiples of bytes or machine
words.

One fundamental internal register is the instruction register (IR), through which moves the
bitstrings (bytes) representing the fundamental machine-level instructions that the processor can
obey. These instructions tend to be extremely simple - operations such as "clear a register" or
"move a byte from one register to another" being the typical order of complexity. Some of these
instructions may be completely defined by a single byte value. Others may need two or more bytes
for a complete definition. Of these multi-byte instructions, the first usually denotes an operation,
and the rest relate either to a value to be operated upon, or to the address of a location in memory at
which can be found the value to be operated upon.

The simplest processors have only a few data registers, and are very limited in what they can
actually do with their contents, and so processors invariably make provision for interfacing to the
memory of the computer, and allow transfers to take place along so-called bus lines between the
internal registers and the far greater number of external memory locations. When information is to
be transferred to or from memory, the CPU places the appropriate address information on the
address bus, and then transmits or receives the data itself on the data bus. This is illustrated in
Figure 4.1.
The memory may simplistically be viewed as a one-dimensional array of byte values, analogous to
what might be described in high-level language terms by declarations like the following
TYPE
ADDRESS = CARDINAL [0 .. MemSize - 1];
BYTES = CARDINAL [0 .. 255];
VAR
Mem : ARRAY ADDRESS OF BYTES;

in Modula-2, or, in C++ (which does not provide for the subrange types so useful in this regard)
typedef unsigned char BYTES;
BYTES Mem[MemSize];

Since the memory is used to store not only "data" but also "instructions", another important internal
register in a processor, the so-called program counter or instruction pointer (denoted by PC or
IP), is used to keep track of the address in memory of the next instruction to be fed to the
processor’s instruction register (IR).

Perhaps it will be helpful to think of the processor itself in high-level terms:

TYPE
PROCESSOR = struct processor {
RECORD BYTES IR;
IR, BYTES R1, R2, R3;
R1, R2, R3 : BYTES; unsigned PC;
PC : ADDRESS; };
END;
VAR processor cpu;
CPU : PROCESSOR;

The operation of the machine is repeatedly to fetch a byte at a time from memory (along the data
bus), place it in the IR, and then execute the operation which this byte represents. Multi-byte
instructions may require the fetching of further bytes before the instruction itself can be decoded
fully by the CPU, of course. After the instruction denoted by the contents of IR has been executed,
the value of PC will have been changed to point to the next instruction to be fetched. This
fetch-execute cycle may be described by the following algorithm:
BEGIN
CPU.PC := initialValue; (* address of first code instruction *)
LOOP
CPU.IR := Mem[CPU.PC]; (* fetch *)
Increment(CPU.PC); (* bump PC in anticipation *)
Execute(CPU.IR); (* affecting other registers, memory, PC *)
(* handle machine interrupts if necessary *)
END
END.

Normally the value of PC alters by small steps (since instructions are usually stored in memory in
sequence); execution of branch instructions may, however, have a rather more dramatic effect. So
might the occurrence of hardware interrupts, although we shall not discuss interrupt handling
further.

A program for such a machine consists, in the last resort, of a long string of byte values. Were these
to be written on paper (as binary, decimal, or hexadecimal values), they would appear pretty
meaningless to the human reader. We might, for example, find a section of program reading
25 45 21 34 34 30 45

Although it may not be obvious, this might be equivalent to a high-level statement like
Price := 2 * Price + MarkUp;

Machine-level programming is usually performed by associating mnemonics with the recognizable

operations, like HLT for "halt" or ADD for "add to register". The above code is far more
comprehensible when written (with commentary) as
LDA 45 ; load accumulator with value stored in memory location 45
SHL ; shift accumulator one bit left (multiply by 2)
ADI 34 ; add 34 to the accumulator
STA 45 ; store the value in the accumulator at memory location 45

Programs written in an assembly language - which have first to be assembled before they can be
executed - usually make use of other named entities, for example
MarkUp EQU 34 ; CONST MarkUp = 34;
LDA Price ; CPU.A := Price;
SHL ; CPU.A := 2 * CPU.A;
ADI MarkUp ; CPU.A := CPU.A + 34;
STA Price ; Price := CPU.A;

When we use code fragments such as these for illustration we shall make frequent use of
commentary showing an equivalent fragment written in a high-level language. Commentary follows
the semicolon on each line, a common convention in assembler languages.

4.2 Addressing modes

As the examples given earlier suggest, programs prepared at or near the machine level frequently
consist of a sequence of simple instructions, each involving a machine-level operation and one or
more parameters.

An example of a simple operation expressed in a high-level language might be

AmountDue := Price + Tax;

Some machines and assembler languages provide for such operations in terms of so-called
three-address code, in which an operation - denoted by a mnemonic usually called the opcode - is
followed by two operands and a destination. In general this takes the form
operation destination, operand 1, operand 2

for example
ADD AmountDue, Price, Tax

We may also express this in a general sense as a function call

destination := operation(operand 1, operand 2 )

which helps to stress the important idea that the operands really denote "values", while the
destination denotes a processor register, or an address in memory where the result is to be stored.

In many cases this generality is restricted (that is, the machine suffers from non-orthogonality in
design). Typically the value of one operand is required to be the value originally stored at the
destination. This corresponds to high-level statements like
Price := Price * InflationFactor;

and is mirrored at the low-level by so-called two-address code of the general form
operation destination, operand
for example
MUL Price, InflationFactor

In passing, we should point out an obvious connection between some of the assignment operations
in C++ and two-address code. In C++ the above assignment would probably have been written
Price *= InflationFactor;

which, while less transparent to a Modula-2 programmer, is surely a hint to a C++ compiler to
generate code of this form. (Perhaps this example may help you understand why C++ is regarded by
some as the world’s finest assembly language!)

In many real machines even general two-address code is not found at the machine level. One of
destination and operand might be restricted to denoting a machine register (the other one might
denote a machine register, or a constant, or a machine address). This is often called one and a half
address code, and is exemplified by
MOV R1, Value ; CPU.R1 := Value
ADD Answer, R1 ; Answer := Answer + CPU.R1
MOV Result, R2 ; Result := CPU.R2

Finally, in so-called accumulator machines we may be restricted to one-address code, where the
destination is always a machine register (except for those operations that copy (store) the contents
of a machine register into memory). In some assembler languages such instructions may still appear
to be of the two-address form, as above. Alternatively they might be written in terms of opcodes
that have the register implicit in the mnemonic, for example
LDA Value ; CPU.A := Value
ADA Answer ; CPU.A := CPU.A + Answer
STB Result ; Result := CPU.B

Although many of these examples might give the impression that the corresponding machine level
operations require multiple bytes for their representation, this is not necessarily true. For example,
operations that only involve machine registers, exemplified by
MOV R1, R2 ; CPU.R1 := CPU.R2
LDA B ; CPU.A := CPU.B
TAX ; CPU.X := CPU.A

might require only a single byte - as would be most obvious in an assembler language that used the
third representation. The assembly of such programs is be eased considerably by a simple and
self-consistent notation for the source code, a subject that we shall consider further in a later
chapter.

In those instructions that do involve the manipulation of values other than those in the machine
registers alone, multi-byte instructions are usually required. The first byte typically specifies the
operation itself (and possibly the register or registers that are involved), while the remaining bytes
specify the other values (or the memory addresses of the other values) involved. In such
instructions there are several ways in which the ancillary bytes might be used. This variety gives
rise to what are known as different addressing modes for the processor, and whose purpose it is to
provide an effective address to be used in an instruction. Exactly which modes are available varies
tremendously from processor to processor, and we can mention only a few representative examples
here. The various possibilities may be distinguished in some assembler languages by the use of
different mnemonics for what at first sight appear to be closely related operations. In other
assembler languages the distinction may be drawn by different syntactic forms used to specify the
registers, addresses or values. One may even find different assembler languages for a common
processor.

In inherent addressing the operand is implicit in the opcode itself, and often the instruction is
contained in a single byte. For example, to clear a machine register named A we might have
CLA or CLR A ; CPU.A := 0

Again we stress that, though the second form seems to have two components, it does not always
imply the use of two bytes of code at the machine level.

In immediate addressing the ancillary bytes for an instruction typically give the actual value that
is to be combined with a value in a register. Examples might be
ADI 34 or ADD A, #34 ; CPU.A := CPU.A + 34

In these two addressing modes the use of the word "address" is almost misleading, as the value of
the ancillary bytes may often have nothing to do with a memory address at all. In the modes now to
be discussed the connection with memory addresses is far more obvious.

In direct or absolute addressing the ancillary bytes typically specify the memory address of the
value that is to be retrieved or combined with the value in a register, or specify where a register
value is to be stored. Examples are
LDA 34 or MOV A, 34 ; CPU.A := Mem[34]
STA 45 MOV 45, A ; Mem[45] := CPU.A
ADD 38 ADD A, 38 ; CPU.A := CPU.A + Mem[38]

Beginners frequently confuse immediate and direct addressing, a situation not improved by the fact
that there is no consistency in notation between different assembler languages, and there may even
be a variety of ways of expressing a particular addressing mode. For example, for the Intel 80x86
processors as used in the IBM-PC and compatibles, low-level code is written in a two-address form
similar to that shown above - but the immediate mode is denoted without needing a special symbol
like #, while the direct mode may have the address in brackets:
ADD AX, 34 ; CPU.AX := CPU.AX + 34 Immediate
MOV AX, [34] ; CPU.AX := Mem[34] Direct

In register-indexed addressing one of the operands in an instruction specifies both an address and
also an index register, whose value at the time of execution may be thought of as specifying the
subscript to an array stored from that address
LDX 34 or MOV A, 34[X] ; CPU.A := Mem[34 + CPU.X]
STX 45 MOV 45[X], A ; Mem[45+CPU.X] := CPU.A
ADX 38 ADD A, 38[X] ; CPU.A := CPU.A + Mem[38+CPU.X]

In register-indirect addressing one of the operands in an instruction specifies a register whose

value at the time of execution gives the effective address where the value of the operand is to be
found. This relates to the concept of pointers as used in Modula-2, Pascal and C++.
MOV R1, @R2 ; CPU.R1 := Mem[CPU.R2]
MOV AX, [BX] ; CPU.AX := Mem[CPU.BX]

Not all the registers in a machine can necessarily be used in these ways. Indeed, some machines
have rather awkward restrictions in this regard.

Some processors allow for very powerful variations on indexed and indirect addressing modes. For
example, in memory-indexed addressing, a single operand may specify two memory addresses -
the first of which gives the address of the first element of an array, and the second of which gives
the address of a variable whose value will be used as a subscript to the array.
MOV R1, 400[100] ; CPU.R1 := Mem[400 + Mem[100]]

Similarly, in memory-indirect addressing one of the operands in an instruction specifies a

memory address at which will be found a value that forms the effective address where another
operand is to be found.
MOV R1, @100 ; CPU.R1 := Mem[Mem[100]]

This mode is not as commonly found as the others; where it does occur it directly corresponds to
the use of pointer variables in languages that support them. Code like
TYPE
ARROW = POINTER TO CARDINAL; typedef int *ARROW;
VAR
Arrow : ARROW; ARROW Arrow;
Target : CARDINAL; int Target;
BEGIN
Target := Arrow^; Target = *Arrow;

might translate to equivalent code in assembler like

MOV AX, @Arrow
MOV Target, AX

or even
MOV Target, @Arrow

where, once again, we can see an immediate correspondence between the syntax in C++ and the
corresponding assembler.

Finally, in relative addressing an operand specifies an amount by which the current program count
register PC must be incremented or decremented to find the actual address of interest. This is
chiefly found in "branching" instructions, rather than in those that move data between various
registers and/or locations in memory.

4.3 Case study 1 - A single-accumulator machine

Although sophisticated processors may have several registers, their basic principles - especially as
they apply to emulation - may be illustrated by the following model of a single-accumulator
processor and computer, very similar to one suggested by Wakerly (1981). Here we shall take
things to extremes and presume the existence of a system with all registers only 1 byte (8 bits)
wide.
4.3.1 Machine architecture

Diagrammatically we might represent this machine as in Figure 4.2.

The symbols in this diagram refer to the following components of the machine

ALU is the arithmetic logic unit, where arithmetic and logical operations are actually
performed.

A is the 8-bit accumulator, a register for doing arithmetic or logical operations.

SP is an 8-bit stack pointer, a register that points to an area in memory that may be
utilized as a stack.

X is an 8-bit index register, which is used in indexing areas of memory which

conceptually form data arrays.

Z, P, C are single bit condition flags or status registers, which are set "true" when an
operation causes a register to change to a zero value, or to a positive value, or to
propagate a carry, respectively.

IR is the 8-bit instruction register, in which is held the byte value of the instruction
currently being executed.

PC is the 8-bit program counter, which contains the address in memory of the
instruction that is next to be executed.

EAR is the effective address register, which contains the address of the byte of data
which is being manipulated by the current instruction.

The programmer’s model of this sort of machine is somewhat simpler - it consists of a number of
"variables" (in the C++ or Modula-2 sense), each of which is one byte in capacity. Some of these
correspond to processor registers, while the others form the random access read/write (RAM)
memory, of which we have assumed there to be 256 bytes, addressed by the values 0 through 255.
In this memory, as usual, will be stored both the data and the instructions for the program under
execution. The processor, its registers, and the associated RAM memory can be thought of as
though they were described by declarations like
TYPE
BYTES = CARDINAL [0 .. 255]; typedef unsigned char bytes;
PROCESSOR = RECORD struct processor {
A, SP, X, IR, PC : BYTES; bytes a, sp, x, ir, pc;
Z, P, C : BOOLEAN; bool z, p, c;
END; };
TYPE STATUS = (running, finished, typedef enum { running, finished,
nodata, baddata, nodata, baddata, badop
badop); } status;
VAR
CPU : PROCESSOR; processor cpu;
Mem : ARRAY BYTES OF BYTES; bytes mem[256];
PS : STATUS; status ps;

where the concept of the processor status PS has been introduced in terms of an enumeration that
defines the states in which an emulator might find itself.

4.3.2 Instruction set

Some machine operations are described by a single byte. Others require two bytes, and have the
format
Byte 1 Opcode
Byte 2 Address field

The set of machine code functions available is quite small. Those marked * affect the P and Z flags,
and those marked + affect the C flag. An informal description of their semantics follows:

Mnemonic Hex Decimal Function

opcode

NOP 00h 0 No operation (this might be used to set a break point in an emulator)
CLA 01h 1 Clear accumulator A
CLC + 02h 2 Clear carry bit C
CLX 03h 3 Clear index register X
CMC + 04h 4 Complement carry bit C
INC * 05h 5 Increment accumulator A by 1
DEC * 06h 6 Decrement accumulator A by 1
INX * 07h 7 Increment index register X by 1
DEX * 08h 8 Decrement index register X by 1
TAX 09h 9 Transfer accumulator A to index register X
INI * 0Ah 10 Load accumulator A with integer read from input in decimal
INH * 0Bh 11 Load accumulator A with integer read from input in hexadecimal
INB * 0Ch 12 Load accumulator A with integer read from input in binary
INA * 0Dh 13 Load accumulator A with ASCII value read from input (a single character)
OTI 0Eh 14 Write value of accumulator A to output as a signed decimal number
OTC 0Fh 15 Write value of accumulator A to output as an unsigned decimal number
OTH 10h 16 Write value of accumulator A to output as an unsigned hexadecimal number
OTB 11h 17 Write value of accumulator A to output as an unsigned binary number
OTA 12h 18 Write value of accumulator A to output as a single character
PSH 13h 19 Decrement SP and push value of accumulator A onto stack
POP * 14h 20 Pop stack into accumulator A and increment SP
SHL + * 15h 21 Shift accumulator A one bit left
SHR + * 16h 22 Shift accumulator A one bit right
RET 17h 23 Return from subroutine (return address popped from stack)
HLT 18h 24 Halt program execution

The above are all single-byte instructions. The following are all double-byte instructions.

LDA B * 19h 25 Load accumulator A directly with contents of location whose address is
given as B
LDX B * 1Ah 26 Load accumulator A with contents of location whose address is given as B,
indexed by the value of X (that is, an address computed as the value of B + X)
LDI B * 1Bh 27 Load accumulator A with the immediate value B
LSP B 1Ch 28 Load stack pointer SP with contents of location whose address is given as B
LSI B 1Dh 29 Load stack pointer SP immediately with the value B
STA B 1Eh 30 Store accumulator A on the location whose address is given as B
STX B 1Fh 31 Store accumulator A on the location whose address is given as B, indexed
by the value of X
ADD B + * 20h 32 Add to accumulator A the contents of the location whose address is given as B
ADX B + * 21h 33 Add to accumulator A the contents of the location whose address is given as
B,indexed by the value of X
ADI B + * 22h 34 Add the immediate value B to accumulator A
ADC B + * 23h 35 Add to accumulator A the value of the carry bit C plus the contents of the
location whose address is given as B
ACX B + * 24h 36 Add to accumulator A the value of the carry bit C plus the contents of the
location whose address is given as B, indexed by the value of X
ACI B + * 25h 37 Add the immediate value B plus the value of the carry bit C to accumulator A
SUB B + * 26h 38 Subtract from accumulator A the contents of the location whose address is
given as B
SBX B + * 27h 39 Subtract from accumulator A the contents of the location whose address is
given as B, indexed by the value of X
SBI B + * 28h 40 Subtract the immediate value B from accumulator A
SBC B + * 29h 41 Subtract from accumulator A the value of the carry bit C plus the contents
of the location whose address is given as B
SCX B + * 2Ah 42 Subtract from accumulator A the value of the carry bit C plus the contents
of the location whose address is given as B, indexed by the value of X
SCI B + * 2Bh 43 Subtract the immediate value B plus the value of the carry bit C from
accumulator A
CMP B + * 2Ch 44 Compare accumulator A with the contents of the location whose address is
given as B
CPX B + * 2Dh 45 Compare accumulator A with the contents of the location whose address is
given as B, indexed by the value of X
CPI B + * 2Eh 46 Compare accumulator A directly with the value B

These comparisons are done by virtual subtraction of the operand from A, and setting the flags P
and Z as appropriate

ANA B + * 2Fh 47 Bitwise AND accumulator A with the contents of the location whose address
is given as B
ANX B + * 30h 48 Bitwise AND accumulator A with the contents of the location whose address
is given as B, indexed by the value of X
ANI B + * 31h 49 Bitwise AND accumulator A with the immediate value B
ORA B + * 32h 50 Bitwise OR accumulator A with the contents of the location whose address
is given as B
ORX B + * 33h 51 Bitwise OR accumulator A with the contents of the location whose address
is given as B, indexed by the value of X
ORI B + * 34h 52 Bitwise OR accumulator A with the immediate value B

BRN B 35h 53 Branch to the address given as B

BZE B 36h 54 Branch to the address given as B if the Z condition flag is set
BNZ B 37h 55 Branch to the address given as B if the Z condition flag is unset
BPZ B 38h 56 Branch to the address given as B if the P condition flag is set
BNG B 39h 57 Branch to the address given as B if the P condition flag is unset
BCC B 3Ah 58 Branch to the address given as B if the C condition flag is unset
BCS B 3Bh 59 Branch to the address given as B if the C condition flag is set

JSR B 3Ch 60 Call subroutine whose address is B, pushing return address onto the stack

Most of the operations listed above are typical of those found in real machines. Notable exceptions
are provided by the I/O (input/output) operations. Most real machines have extremely primitive
facilities for doing anything like this directly, but for the purposes of this discussion we shall cheat
somewhat and assume that our machine has several very powerful single-byte opcodes for handling
I/O. (Actually this is not cheating too much, for some macro-assemblers allow instructions like this
which are converted into procedure calls into part of an underlying operating system, stored perhaps
in a ROM BIOS).
A careful examination of the machine and its instruction set will show some features that are
typical of real machines. Although there are three data registers, A, X and SP, two of them (X and
SP) can only be used in very specialized ways. For example, it is possible to transfer a value from A
to X, but not vice versa, and while it is possible to load a value into SP it is not possible to examine
the value of SP at a later stage. The logical operations affect the carry bit (they all unset it), but,
surprisingly, the INC and DEC operations do not.

It is this model upon which we shall build an emulator in section 4.3.4. In a sense the formal
semantics of these opcodes are then embodied directly in the operational semantics of the machine
(or pseudo-machine) responsible for executing them.

Exercises

4.1 Which addressing mode is used in each of the operations defined above? Which addressing
modes are not represented?

4.2 Many 8-bit microprocessors have 2-byte (16-bit) index registers, and one, two, and three-byte
instructions (and even longer). What peculiar or restrictive features does our machine possess,
compared to such processors?

4.3 As we have already commented, informal descriptions in English, as we have above, are not as
precise as semantics that are formulated mathematically. Compare the informal description of the
INC operation with the following:

INC * 05h 5 A := (A + 1) mod 256; Z := A = 0; P := A IN {0 ... 127}

Try to express the semantics of each of the other machine instructions in a similar way.

4.3.3 A specimen program

Some examples of code for this machine may help the reader’s understanding. Consider the
problem of reading a number and then counting the number of non-zero bits in its binary
representation.

Example 4.1

The listing below shows a program to solve this problem coded in an ASSEMBLER language
based on the mnemonics given previously, as it might be listed by an assembler program, showing
the hexadecimal representation of each byte and where it is located in memory.
00 BEG ; Count the bits in a number
00 0A INI ; Read(A)
01 LOOP ; REPEAT
01 16 SHR ; A := A DIV 2
02 3A 0D BCC EVEN ; IF A MOD 2 # 0 THEN
04 1E 13 STA TEMP ; TEMP := A
06 19 14 LDA BITS
08 05 INC
09 1E 14 STA BITS ; BITS := BITS + 1
0B 19 13 LDA TEMP ; A := TEMP
0D 37 01 EVEN BNZ LOOP ; UNTIL A = 0
0F 19 14 LDA BITS ;
11 0E OTI ; Write(BITS)
12 18 HLT ; terminate execution
13 TEMP DS 1 ; VAR TEMP : BYTE
14 00 BITS DC 0 ; BITS : BYTE
15 END

Example 4.2 (absolute byte values)

In a later chapter we shall discuss how this same program can be translated into the following
corresponding absolute format (expressed this time as decimal numbers):
10 22 58 13 30 19 25 20 5 30 20 25 19 55 1 25 20 14 24 0 0

Example 4.3 (mnemonics with absolute address fields)

For the moment, we shall allow ourselves to consider the absolute form as equivalent to a form in
which the mnemonics still appear for the sake of clarity, but where the operands have all been
converted into absolute (decimal) addresses and values:
INI
SHR
BCC 13
STA 19
LDA 20
INC
STA 20
LDA 19
BNZ 1
LDA 20
OTI
HLT
0
0

Exercises

4.4 The machine does not possess an instruction for negating the value in the accumulator. What
code would one have to write to be able to achieve this?

4.5 Similarly, it does not possess instructions for multiplication and division. Is it possible to use
the existing instructions to develop code for doing these operations? If so, how efficiently can they
be done?

4.6 Try to write programs for this machine that will

(a) Find the largest of three numbers.

(b) Find the largest and the smallest of a list of numbers terminated by a zero (which is
not regarded as a member of the list).

(d) Compute N! for small N. Try using an iterative as well as a recursive approach.

(e) Read a word and then write it backwards. The word is terminated with a period. Try
using an "array", or alternatively, the "stack".

(f) Determine the prime numbers between 0 and 255.

(g) Determine the longest repeated sequence in a sequence of digits terminated with
zero. For example, for data reading 1 2 3 3 3 3 4 5 4 4 4 4 4 4 4 6 5 5 report that "4
appeared 7 times".

(h) Read an input sequence of numbers terminated with zero, and then extract the
embedded monotonically increasing sequence. For example, from 1 2 12 7 4 14 6 23
extract the sequence 1 2 12 14 23.

(i) Read a small array of integers or characters and sort them into order.

(j) Search for and report on the largest byte in the program code itself.

(k) Search for and report on the largest byte currently in memory.

(l) Read a piece of text terminated with a period, and then report on how many times
each letter appeared. To make things interesting, ignore the difference between upper
and lower case.

(m) Repeat some of the above problems using 16-bit arithmetic (storing values as pairs
of bytes, and using the "carry" operations to perform extended arithmetic).

4.7 Based on your experiences with Exercise 4.6, comment on the usefulness, redundancy and any
other features of the code set for the machine.

4.3.4 An emulator for the single-accumulator machine

Although a processor for our machine almost certainly does not exist "in silicon", its action may
easily be simulated "in software". Essentially we need only to write an emulator that models the
fetch-execute cycle of the machine, and we can do this in any suitable language for which we
already have a compiler on a real machine.

Languages like Modula-2 or C++ are highly suited to this purpose. Not only do they have
"bit-twiddling" capabilities for performing operations like "bitwise and", they have the advantage
that one can implement the various phases of translators and emulators as coherent, clearly
separated modules (in Modula-2) or classes (in C++). Extended versions of Pascal, such as Turbo
Pascal, also provide support for such modules in the form of units. C is also very suitable on the
first score, but is less well equipped to deal with clearly separated modules, as the header file
mechanism used in C is less watertight than the mechanisms in the other languages.

In modelling our hypothetical machine in Modula-2 or C++ it will thus be convenient to define an
interface in the usual way by means of a definition module, or by the public interface to a class. (In
this text we shall illustrate code in C++; equivalent code in Modula-2 and Turbo Pascal will be
found on the diskette that accompanies the book.)

The main responsibility of the interface is to declare an emulator routine for interpreting the code
stored in the memory of the machine. For expediency we choose to extend the interface to expose
the values of the operations, and the memory itself, and to provide various other useful facilities
that will help us develop an assembler or compiler for the machine in due course. (In this, and in
other interfaces, "private" members are not shown.)
// machine instructions - order is significant
enum MC_opcodes {
MC_nop, MC_cla, MC_clc, MC_clx, MC_cmc, MC_inc, MC_dec, MC_inx, MC_dex,
MC_tax, MC_ini, MC_inh, MC_inb, MC_ina, MC_oti, MC_otc, MC_oth, MC_otb,
Other documents randomly have
different content
be enough to build cathedrals for all our bishops. Why not the
same money drawn to effect the spiritual conquest? Because
they do not care about it. Then, let us make them; and how?
The first step, of course, must be to care for it ourselves. 'Si vis
me flere, dolendum est primum ipsi tibi.' And what can we
do to bring our English and Scotch to this?—Grumble at them, I
suppose."

On his return from France in September, himself and Father Eugene

came to the determination to move away from The Hyde, if a more
convenient site could be procured. The reason of this was chiefly
the unsuitableness of the place to the working of our vocation. It
was too solitary for missionaries, and there was no local work for a
number of priests. Some of the fathers disguise themselves in
secular suits, less unseemly than that in which they once beheld
Father Ignatius, and go in search of a place, but without success.
Father Ignatius gave a mission at this time in Kentish Town, and he
little thought, as he took his walk along the tarred paling in Maiden
Lane, that inside lay the grounds of the future St. Joseph's Retreat.

Towards the end of the year 1852, Father Ignatius accompanies as

far as London Bridge a colony of Passionists, whom Dr. O'Connor,
the Bishop of Pittsburg, was bringing out to the United States.
These Passionists have grown in gentem magnam, and the
worthy Bishop, like another Odescalchi, resigned his crosier, and
became a Jesuit.

He concludes this year and begins the next giving retreats. The
scenes of his labours in this department were Somers Town,
Blandford Square (London), our own house, Dudley, and Douay. He
also assisted at a mission in Commercial Road, London, E.

The heaviest part of his work, as a member of The Hyde

community, was attending to the parish, which, with the Barnet
Mission, then under our charge, was equal in area to many a
diocese in Catholic countries. Father Ignatius often walked thirty
miles in one day on parochial duty. To give an idea of how he went
through this work, one instance will suffice. On one day to went to
Colney Hatch Lunatic Asylum, and from all the unhappy inmates he
was able to get one confession. Next day he walked to give the
Holy Communion to this single penitent, and walked afterwards to
Barnet before he broke his fast. This must be a distance of at least
fifteen miles.

In May, 1853, he gives a retreat to his old parishioners of West

Bromwich, another in Winchester in July, to the nuns in
Wolverhampton in August, and to the people in Oxburgh in
October, and in Southport, Lancashire, in Advent.

The 16th of November this year was a great day for our
congregation. It was the first feast of Blessed Paul of the Cross, our
holy founder. There was a great re-union of the chief fathers of the
order in St. Wilfrid's—the Bishops of Birmingham and Southwark,
and Dr. Ullathorne and Dr. Grant assisted at the solemnity. Father
Ignatius was there, of course. Father Paul was beatified on the
28th September, 1852. Our religious had prayed and worked for the
great event, and had now the happiness of seeing him raised to
the altar.

He stays at home a great deal now, as a rector ought to do, except

in intervals of missions and retreats; and the lion's share of parish
work falls to him. He sends one of the priests of his community to
France to beg for the house; but he had, in a very short time, to
send him money for his expenses home. He then concludes that he
should himself be considered beggar-in-chief, and accordingly goes
out for a few days to collect alms in London. With his alms, he
collects into the Church a young Puseyite minister, who is now a
zealous priest on the London mission.

Father Ignatius visits the neighbouring ministers, but not as

formerly; he simply goes to see his old acquaintances, and if the
conversation could be transferred from compliments and common-
place remarks to matters of higher interest, he was not the man to
let the opportunity pass by. Among his old friends in the Anglican
ministry there seemed to have been few for whom he always
cherished so kindly a regard as the Rev. Mr. Harvey, Rector of
Hornsey. That excellent clergyman used to visit Father Ignatius, and
receive visits from him on the most friendly terms to the end.

Thus did he spend his time, until Father Pius, the brother of our
present General, who died in Rome in 1864, came to visit the
province, or branch of the order in England, in 1854. This visit
made a change in Father Ignatius's position.

A number of houses of a religious order are placed under the

direction of one superior, who is styled a Provincial. With us the
Provincial has two assistants, who are called Consultors. The
superior of each house is called a Rector, and it is his duty to see
after the spiritual and temporal concerns of his own community. A
rector, therefore, has more home work, by virtue of his office, than
any other superior. A consultor may live in any house of the
province, has no special duty ex officio except to give his advice
to the Provincial when asked, and may be easily spared for any
external employment. This office Father Ignatius used to term as
otium cum dignitate, though the otium he never enjoyed, and
felt rather awkward in the dignitas.

In 1854, he was made first Consultor, and relieved from the

drudgery of housekeeping for his brethren. Before leaving The
Hyde for a new field of labour, he went to see his nephew in
Harrow, which was only a few miles from our retreat; but was not
admitted. He took another priest with him, and both were hooted
by the boys. It seems pardonable in a set of wild young schoolboys
to make game of such unfashionable beings as Catholic priests; but
it shows a great want of good breeding in schoolboys who are
afterwards to hold such a high position in English society. This
remark is forced upon us by the fact that none of us ever passed
through Harrow without meeting a somewhat similar reception. A
school of inferior rank might set Harrow an example in this point.
We have passed Roger Cholmley's school in Highgate, time after
time, often in a large body, and have met the boys in threes and
fours, and all together, and never yet heard a single insult. What
makes the difference?

On the 8th of September, 1854, Father Ignatius left The Hyde for
Ireland. He begs this time through the principal towns in Munster,
and says he was very kindly received by all. He preached sermons
during this journey, all on the conversion of England. He gained
more prayers this time than on a former occasion, because his
work came to the people with blessings and indulgences from the
Father of the Faithful. He used to tell an amusing anecdote in
reference to this mission. Somewhere he had preached on the
conversion of England, and recommended the prayers by the
spiritual profit to be derived from them. An old woman accosted
him as he was passing by, and he had just time to hear, "Father, I
say the three Hail Marys every day for England." Father Ignatius
was much pleased, and made inquiries after the old lady, doubtless
intending to constitute her a kind of apostle in the place. She was
brought to see him; he expressed his thanks and pleasure that she
had entered so thoroughly into his views, and asked her would she
try to persuade others to follow her example? "Me get people to
pray for England!" she answered; "I pray myself three times for the
sake of the indulgence, but I curse them 300 times a day for it,
lest they might get any good of my prayers!" He reasoned with her,
to be sure, but did not tell us if the success of his second discourse
was equal to the first.
CHAPTER XIII.
Sanctification Of Ireland.

In a letter written by Father Ignatius in December, 1854, is found

the first glimpse of a new idea: the Sanctification of Ireland. This
idea was suggested to him by the faith of the Irish people, and by
their readiness to adopt whatever was for their spiritual profit. His
intending the Sanctification of Ireland as a step towards the
Conversion of England, laid the scheme open to severe criticism. It
was said that England was his final object; that Ireland was to be
used as an instrument for England's benefit; that if his patriotism
were less strong, his sanctity would be greater. If these objections
were satisfactorily answered, they might be given up with a hint
that, "it was a very Irish way to convert England, by preaching in
the bogs of Connaught." The best refutation of these ungenerous
remarks will be, perhaps, a simple statement of what his ideas
were upon the subject. His great desire was that all the world
should be perfect. He used to say Our Lord had not yet had His
triumph in this world, and that it was too bad the devil should still
have the majority. "This must not be," he would say; "I shall never
rest as long as there is a single soul on earth who does not serve
God perfectly." The practical way of arriving at this end was to
begin at home. England had not faith as a nation, so there was no
foundation to build sanctity upon there. England, however, had
great influence as a nation all over the world; she showed great
zeal also in her abortive attempts to convert the heathen. If her
energies could be turned in the right direction, what grand results
might we not anticipate? Another reflection was, England has had
every means of conversion tried upon her; let us now see what
virtue there is in good example. To set this example, and to sow
the seed of the great universal harvest, he would find out the best
Catholic nation in the world, and bring it perfectly up to the
maxims of the Gospel. This nation was Ireland, of course, and it
was near enough to England to let its light shine before her. What
he wished for was, to have every man, woman, and child in
Ireland, take up the idea that they were to be saints. He would
have this caught up with a kind of national move. The practical
working of the idea he embodied in a little book which he wrote
some time afterwards, and preached it wherever he addressed an
Irish congregation. The banishing of three great vices—cursing,
company-keeping, and intemperance—and the practice of daily
meditation, with a frequent approach to the sacraments, were the
means. If Ireland, so he argued, took up this at home, it would
spread to England, the colonies, and to wherever there was an
Irishman all over the world. All these would be shining lights, and if
their neighbours did not choose at once to follow their example, we
could at least point it out as the best proof of our exhortations.
This is a short sketch of the work he now began, and it was a work
his superiors always encouraged, and which he spent his life in
endeavouring to realise.

One objection made against this scheme touched him on a tender

point—his love of country. Many Catholics, especially English
converts, thought the words of Ecclesiasticus applicable to England:
"Injuries and wrongs will waste riches: and the house that is very
rich shall be brought to nothing by pride: so the substance of the
proud shall be rooted out."—Eccl. xxi. 5. These were of opinion that
England must be humbled as a nation, and deeply too, before she
could be fit for conversion. This Father Ignatius could not stand. He
writes, in a letter to Mr. Monteith: "As my unicum necessarium
for myself is the salvation and sanctification of my own soul, so my
wishes and designs about England, which, according to the order of
charity, I consider (in opposition to many English Catholics,
especially converts), I ought to love first of all people, are, singly
and only, that she may be brought to God, and in such a way and
under such circumstances, as may enable her to be the greatest
possible blessing to the whole world. I have heard plenty, and
much more than plenty, from English and Irish Catholics (very
seldom, comparatively, from those of the Continent), about the
impossibility of this, except by the thorough crushing of the power
of England. I say to all this, No, no, no! God can convert our
country with her power and her influence unimpaired, and I insist
on people praying for it without imposing conditions on Almighty
God, on whom, if I did impose conditions, it would be in favour of
His showing more, and not less abundant, mercy to a fallen people.
Yet, though I have often said I will not allow Miss This, or Mr. That,
to pronounce sentence on England, still less to wish evil to her
(particularly if it be an English Mr. or Miss who talks), I have always
said that if God sees it fit that the conversion should be through
outward humiliations and scourges, I will welcome the rod, and
thank Him for it, in behalf of my country, as I would in my own
person, in whatever way He might think fit to chastise and humble
me."

He returned to London in the beginning of 1855, to give the retreat

to our religious. His next work was a mission, given with Father
Gaudentius in Stockport. After that, he gave a mission with Father
Vincent in Hull; in returning from Hull, he stopped at Lincoln to visit
Mr. Sibthorpe. He spends a week in our London house, and then
gives a retreat by himself in Trelawny. His next mission was in
Dungannon, Ireland, and as soon as he came to England for
another retreat he had to give in Levenshulme to nuns, he takes
advantage of his week's rest to visit Grace Dieu, and have what he
calls "a famous talk" with Count de Montalembert, who was Mr.
Phillipps's guest at the time.

The scene of his labours is again transferred. We find him in July

giving a mission at Borris O'Kane, with Father Vincent and Father
Bernard and another immediately after, at Lorrha. At one of these
missions, the crowd about Father Ignatius's confession-chair was
very great, and the people were crushing in close to the confessor's
knees. One woman, especially, of more than ordinary muscular
strength, elbowed back many of those who had taken their places
before she came; she succeeded in getting to the inner circle of
penitents, but so near the person confessing that the good father
gently remonstrated with her. All to no purpose. He spoke again,
but she only came nearer. At length he seized her shawl, rolled it
up in a ball, and flung it over the heads of the crowd; the poor
woman had to relinquish her position, and go for her shawl, and
left Father Ignatius to shrive her less pushing companions. His
fellow missioners were highly amused, and this incident tells
wonderfully for his virtue, for it is almost the only instance we
could ever find of his having done anything like losing his temper
during his life as a Passionist. He gives a retreat in Birr, in
Grantham Abbey, a mission in Newcastle, and another in St.
Augustine's, Liverpool, before the end of the year.

It was his custom, since his first turning seriously to God's service,
to be awake at midnight on New Year's Day, and begin by prayer
for passing the coming year perfectly. He is in St. Anne's, Sutton,
Lancashire, this year. He begins the new year, 1856, by giving a
mission with Father Leonard in our church at Sutton, with a few
sermons at a place called Peasly Cross, an offshoot of the mission
we have there.

We close this chapter by a notion of Father Ignatius's politics. He

was neither a Whig, a Tory, nor a Radical. He stood aloof from all
parties, and seldom troubled himself about any. He says in a letter
to a friend who was a well-read politician:—"How many minds we
have speaking in England!—Gladstone, Palmerston, Bright, Phillipps,
yourself, and, perhaps, I should add myself, and how many more
who knows? all with minds following tracks which make them travel
apart from each other. I want to set a road open, in which all may
walk together if they please—at least with one foot, if they must
have their own particular plank for the other."
CHAPTER XIV.
Another Tour On The Continent.

The Provincial once more sent Father Ignatius to beg on the

Continent. He tried to do a double work, as he did not like to be
"used up" for begging alone, and the plea of begging would find
him access to those he intended to consult. This second work was
a form into which he cast his ideas for the sanctification of the
world. The way of carrying out these ideas, which has been
detailed, was what he settled down to after long discussion and
many corrections from authority. The pamphlet which he now wrote
had been translated into German by a lady in Münster. In it he
proposes a bringing back of Catholics to the infancy of the Church,
when the faithful laid the price of their possessions at the feet of
the Apostles. He proposed a kind of Theocracy, and the scheme
creates about the same sensation as Utopia, when one reads it.
Like Sir Thomas More, Father Ignatius gives us what he should
consider a perfect state of Christian society; he goes into all the
details of its working, and meets the objections that might arise as
it proceeds. The pamphlet is entitled Reflectiones
Propositionesque pro fidelium Sanctificatione."

On February 14, 1856, he leaves London, and halts in Paris only for
a few hours, on his way to Marseilles. There he sees the
Archbishop, and begs in the town; he returns then to Lyons, where
he has several long conferences with Cardinal de Bonald. We find
him in Paris in a few days, writing circulars to the French bishops,
of whom the Bishop of Nancy seems to have been his greatest
patron. He writes a letter to the Empress, and receives an answer
that the Emperor would admit him to an audience. In a day or two
Father Ignatius stands in the presence of Napoleon III., and it is a
loss that he has not left us the particulars of the conference in
writing, because he often reverted to it in conversation with a great
deal of interest. He found at his lodgings, on returning from a
quête a few days after, l,000f. sent to him as a donation by the
Emperor.

His good success in the Tuileries gave him a hope of doing great
things among the élite of Parisian society. He is, however, sadly
disappointed, and the next day sets off to Belgium.

Arrived in Tournai, he sends a copy of the French circular to the

Belgian bishops. This does not seem to be a petition for alms, as
we find him the same evening travelling in a third-class carriage to
Cologne, without waiting for their Lordships' answers.

During his begging in Cologne, he says mass every morning in St.

Colomba's (Columb-Kille's) Church; perhaps the spirit of hospitality
was bequeathed to the clergy of this Church by their Irish patron,
for he appears to have experienced some coldness from the
pfarren of Cologne.

In Münster he is very well received. The Bishop is particularly kind

to him, and looks favourably on his Reflectiones; besides that, his
lordship deputes a priest to be his guide in begging. Father Ignatius
notes in his journal that he preached extempore in German to the
Jesuit novices, and that one of the fathers revises and corrects the
German translation of the Reflectiones. The priest deputed for
guide by the Bishop of Münster was called away on business of
importance, and Father Ignatius finds another. This Kaplan "lost his
time smoking," and our good father gave up, and went off by Köln
to Coblentz.

He finds the bishop here very kind, but is allowed to beg only of
the clergy; the Jesuits give him hospitality. A cold reception in
Mantz, and a lukewarm one in Augsburg, hurry him off to Munich.
He submits the Reflectiones to Dr. Döllinger, who corrects them
and gives them his approbation.

From Munich he proceeds to Vienna. A part of this journey, as far

as Lintz, had to be performed by an eilwayen or post car. The
driver of this vehicle was a tremendous smoker, and Father Ignatius
did not at all enjoy the fumes of tobacco. He perceived that the
driver forgot the pipe, which he laid down at a hoff on the way,
while slaking his thirst, and never told him of it. He was exulting in
the hope of being able to travel to the next shop for pipes without
inhaling tobacco smoke, when, to his mortification, the driver
perceived his loss, and shouted out like a man in despair, Mein
pfeiffe! Mein pfeiffe!—My pipe! My pipe! To increase his
passenger's disappointment, he actually turned back a full German
league, and then smoked with a vengeance until he came to the
next stage.

Father Ignatius sends a copy of the Reflectiones to Rome, on his

arrival in Vienna, and presents it with an address at an assembly of
Bishops that was then being held.

He has audiences with the Emperor and Archduke Maximilian, now

Emperor of Mexico, as well as with the Nunzio, and all the
notabilities, clerical and secular, in the city.

Immediately after, somehow, he gets notice to quit from the

Superior of a religious community, where he had been staying, and
all the other religious houses refuse to take him in. He was about
to leave Vienna in consequence, as he did not like putting up in an
hotel, when some Italian priests gave him hospitality, and
welcomed him to stop with them as long as he pleased. As a set-
off to his disappointment, the Bishop of Transylvania is very kind to
him, and Cardinal Schwartzenberg even begs for him. He met the
Most Rev. Father Jandel, General of the Dominicans, in the
Cardinal's Palace, and showed him the Reflectiones. The good
disciple of St. Thomas examined the document closely, and Father
Ignatius records his opinion, "he gave my paper a kick."
Notwithstanding this sentence, he went on distributing copies every
where; but his tract-distribution was stopped in a few days by a
letter he received from our General.

When he sent the little pamphlet to Rome it was handed for

criticism to the Lector (or Professor) of Theology in our retreat,
who was then Father Ignatius Paoli, the present Provincial in
England. The critique was very long and quite unfavourable; it
reached him, backed by a letter from the General, which forbade to
speak about the counsels for the present. He records this sentence
in his journal in these words:—"June 17. A letter from Padre
Ignazio, by the General—Order to stop speaking of the counsels,
&c. Stop her, back her. Deo gratias!" This was a favourite
expression with him whenever a Superior thwarted any of his
projects: it was borrowed from the steamboats that ply on the
Thames, and Father Ignatius considered himself as in the position
of the little boy who echoes the orders of the master to the
engineers below. He used to say, "What a catastrophe might one
expect if the boy undertook to give an order of his own!"

Whilst in Vienna he received a letter from Father Vincent, telling

him of our having established a house of the order near Harold's
Cross, Dublin. Father Ignatius accompanied Father Vincent when
they were both in Dublin, before the German tour began, in his
search for a position, and Rathmines was selected. The excellent
parish priest, Monsignor Meagher, had just opened his new church,
and laboured hard to have a religious community in his district. He
therefore seconded the intentions of our people, and in a short
time a house was taken in his parish, and every day cements the
connexion between us and this venerable ecclesiastic. A splendid
edifice has since been built during the Rectorship of Father
Osmond, and chiefly through his exertions.

Father Ignatius went to two or three towns, where the police would
not allow him to beg unless patronised by a native priest, and not
being able to fulfil these conditions he was obliged to desist.

This was Father Ignatius's last visit to Germany; he had been there
five times during his life. The first was a tour of pleasure, all the
rest were for higher objects. He seems to have had a great regard
for the Germans; he considered them related by blood to the
English, and although he himself was of Norman descent, he
appears to have a special liking for the Saxon element in character.
He preferred to see it blended certainly, and would consider a vein
of Celtic or Norman blood an improvement on the Teutonic.

There were other reasons. St. Boniface, the Apostle of Germany,

was an Englishman; St. Columbanus and St. Gall might be said to
have laboured more in Germany than in their native Ireland. The
Germans owed something to England, and he wished to have them
make a return. Besides, the Reformation began in Germany, and he
would have the countrymen of Luther and of Cranmer work
together to repair the injuries they had suffered from each other.
This twofold plea was forced upon him by a German periodical,
which advocated the cause of the "Crusade" even so far back as
1838. Father Ignatius also knew how German scholarship was
tinging the intellect of England, and he thought a spread of
devotion would be the best antidote to Rationalism. The reasons for
working in France, which he styled "that generous Catholic nation,"
were somewhat different, but they have been detailed by himself in
those portions of the correspondence respecting his crusade.

He visits Raal, Resburg, Baden, Ratisbonne, and Munich; hence he

starts for London. Here he arrives on the 4th of October. He did not
delay, but went straight to Dublin, and stayed for the first time in
Blessed Paul's Retreat, Harold's Cross. This house became his head-
quarters for some time, for we find him returning thither after a
mission in Kenilworth, and one in Liverpool, as well as a retreat for
nuns, which closes his labours for the year 1856.
CHAPTER XV.
Father Ignatius In 1857.

Seven years, according to physiologists, make a total change in the

human frame, such is the extent of the renewal; and although the
laws of spirit do not follow those of matter, it may be a pleasing
problem to find out how far there is an analogy. The chapter of
1850 was headed like this; let us see if the events of both tell
differently upon Father Ignatius.

The first event he records in the Journal for this year is the
reception of Mrs. O'Neill into the Church. This good lady had then
one son a Passionist; she was what might be called a very strict
and devoted Protestant, although all her children were brought up
Catholics by her husband. She loved the son who first joined our
order very tenderly, and felt his becoming a monk so much that
she would never read one of his letters. The son was ordained
priest in Monte-Argentaro, and the first news he heard after he had
for the first time offered up the Holy Sacrifice, was that his mother
had been received in our retreat in Dublin by Father Ignatius. She
was induced by another son, who lived in Dublin, to attend
benediction, and our Lord gave her the grace of conversion with
His blessing. She is now a fervent Catholic, and another son and a
daughter have since followed the example of their brother. The
mother finds her greatest happiness in what once seemed her
greatest affliction. Such is the power of grace, always leading to joy
through the bitterness of the cross.

The next event is the death of Father Paul Mary of St. Michael. This
saintly Passionist was the Honourable Charles Reginald Packenham,
son of the Earl of Longford. He became a convert when captain in
the Guards, and shortly after joined our Institute. He was the first
rector of Blessed Paul's Retreat, and having edified his brethren by
his humility and religious virtues for nearly six years, the term of
his life as a Passionist, died in the odour of sanctity. He had been
ailing for some time, but still able to do a little in the way of
preaching and confessions. It was advertised that he would preach
in Gardiner Street, Dublin, on Sunday, March 1. He died that day at
one o'clock A.M., and Father Ignatius went to preach in his stead; it
created a sensation when the good father began by asking prayers
for the repose of the soul of him whose place he came to fill.

In a letter Father Ignatius wrote at this time we have his opinion of

Father Paul Mary: ".... As to the Passionists, I do not think those
who managed our coming here (to Dublin) which was all done
during my absence in Germany, had any idea of serving England. I
believe the prime instigator of the move was Father Paul Mary, who
was born in Dublin, and was through and through an Irishman in
his affections, though trained in England. He, to the last, had all
the anti-English feelings, which prevail so much through Ireland,
and never would give me the least hope of his being interested for
England. I fall in, notwithstanding that, with all the notions of his
great virtue and holiness which others have; and I think, moreover,
that the best Catholics in Ireland are to be found among those who
have been the most bitterly prejudiced against England. But I think
there is in reserve for them another great step in advance when
they lay down this aversion and turn it into divine charity in a
heroic degree."

Father Ignatius always felt keenly Father Paul Mary's not taking up
his ideas about England with more warmth. When he was on his
death-bed, Father Ignatius spent many hours sitting by him. In one
of their last conversations, Father Ignatius urged his pleas for
England as strongly as he could; when he had done, and was
waiting for the effect, Father Paul said, in a dry, cold manner: "I
don't think Ireland has got anything to thank England for." These
words were perpetually ringing in the ears of Father Ignatius; they
were the last Father Paul ever said on the subject, and the other
used to say: "Oh, I used to enjoy his beautiful conversation so
much, but I never could hear one single kind word for England."

This year a general chapter of our Congregation was held in Rome.

This is an important event, and only occurs every six years. It is
here the head superiors are elected, points of rule explained, and
regulations enacted for the better ordering of the different houses
all over the world, according to circumstances of time and place.
The Provincial and the two Consultors of each Province are obliged
to attend. Father Ignatius was therefore called to travel abroad
once more. When in Rome, he employed all the time that was left
from capitular duties in holding conferences with our students, and
trying to get some papers he brought with him approved. Among
others, he brought the paper that was "kicked" by Father Jaudel,
and condemned by one of our theologians. The only one in Rome
who approved of it was the Abbate Passaglia. Cardinal Barnabò
listened to all Father Ignatius had to urge in its favour; but did not
approve of it. He had to return without gaining anything this time;
except that the Roman Lector was become his Provincial. In a few
years afterwards, when we read of Passaglia's fall, Father Ignatius
was heard to say: "Passaglia and Döllinger were the only
theologians who approved of my paper. I suppose I need not flatter
myself much upon their imprimatur."

He was remarked to be often abstracted when he had many

crosses to bear. One day he was going through Rome with one of
our Religious, and passed by a fountain. He went over and put his
hand so far into one of the jets, that he squirted the water over a
number of poor persons who were basking in the sun a few steps
beneath him. They made a stir, and uttered a few oaths as the
water kept dashing down on them. The companion awoke Father
Ignatius out of his reverie, and so unconscious did he seem of the
disturbance he had unwittingly created, that he passed on without
alluding to it.
On his return home now, as Second Consultor, he is sent to beg
again in Ireland. He makes the circuit of Connaught this time. He
took, in his journey, Roscommon, Castlerea, visits the O'Connor
Don, Boyle, Sligo. Here he was received very kindly by the Bishop
and clergy. He had for guide in Sligo, a Johnny Doogan, who seems
to have amused him very much. This good man was chief
respondent at the Rosary, which used to be said every evening in
the church. One night the priest began, "Incline unto my aid, O
Lord." No answer. "Where are you, Johnny Doogan?" asked the
priest. Johnny, who was a little more than distracted in some corner
of the church, replied, as if suddenly awoke: "Here I am, your
Reverence, and 'my tongue shall announce thy praise.'" He next
passes along through Easky and Cullinamore to Ballina. He gives a
retreat to the Sisters of Mercy here, and during it, makes an
excursion to Enniscrone. He went next to Ballycastle, Killala,
Castlebar. Here he went to visit his cousin, Lord Lucan, and is very
kindly received. During the course of conversation, he asked Lord
Lucan if he had not heard of his conversion? "Oh yes," he replied,
"I heard you were wavering some thirty years ago." "But I have not
wavered since," replied Father Ignatius. He then went to Ballinrobe,
Westport, Tuam, Athenry, and back to Dublin, by Mullingar. This
tour took nearly two months. He gives a retreat in the beginning of
September to the nuns of Gorey, and after it, begs through
Wexford, and the southwest portion of Leinster. The only thing
remarkable about these excursions is, that he notes once, "I am
ashamed to think that I have not begged of any poor people to-
day."

In December, 1857, his brother Frederick, Lord Spencer, died. This

brother was Father Ignatius's companion at school, and it is
remarkable that he was the only one of the family who used any
kind of severity towards him. He says, in a letter written at this
time, "I am twelve years an exile from Althorp." Shortly before the
Earl died, he relented, and invited Father Ignatius to stay at the
family seat a few days. The letter joyfully accepting the invitation
was read by the brother on his bed of death. It is only right to
observe that the present Earl has been the kindest of all, and
treated his uncle with distinguished kindness for the few years he
was left to him. He even gave him back the portion of his income
which his father diverted to other uses.

Another letter he wrote in December, gives an idea of his spirit of

resignation. It seems a Rev. Mother wrote to him in a state of
alarm that some of the sisters were inclined to go away. Here is a
part of his answer: "I will see what I can do with the sisters who
are in the mood to kick, bite, or run away. If they take to running,
never mind how many go, let them all go, with God bless them,
and thank God they are gone, and we will hope their room will
be worth as much as their company."

Lest the allusion to his exile from Althorp might be taken in a

wrong sense, it is well to give a passage from a letter Father
Ignatius wrote after the death of his brother. "I dare say you have
not heard that just before my brother's death I had written to him
about a case of distress, which he had before been acquainted
with, telling him, at the same time, of what I was about, and
among other things, that I was going to London to open a mission
in Bermondsey on the 10th of January. He sent me £3 for the
person I wrote about, and invited me to stop at Althorp a couple of
nights on my way, not demanding any positive promise about
religion as beforetime, but only saying that he thought I might
come as a private friend without seeing it necessary to hold
spiritual communications with the people in the neighbourhood. I
answered that I would come with pleasure on these terms, and
that even if he had said nothing, prudence would dictate to me to
act as he wished. This was a most interesting prospect to me, after
my twelve years' exile from that home, and I intended to come on
the 7th of January. It was only a day or two before my leaving
Dublin for this journey, that I was shown a notice in the paper of
his death, and the next day had a letter about it from my sister. He
must have received my letter on the very day that he was taken ill.
These are remarkable circumstances. What will Providence bring
out of them?" He felt the death of this brother very much, and was
known to shed tears in abundance when relating the sad news to
some of his friends. He said very sadly, "I gave myself up to three
days' sorrowing for my dear brother Frederick, but I took care to
thank God for the affliction."
CHAPTER XVI.
His "Little Missions."

On the 21st of June, 1858, Father Ignatius began to give short

retreats, which he designated "little missions." This was his work
the remaining six years of his life; anything else we find him doing
was like an exception.

The work proposed in these missions was what has been already
described in the chapter on the sanctification of the Irish people.
He wanted to abolish all their vices, which he reduced to three
capital sins, and sow the seeds of perfect virtue upon the ground of
their deep and fertile faith. Since he took up the notion that Ireland
was called to keep among the nations the title of Island of
Saints, which had once been hers, he could never rest until he
saw it effected. He seems to have been considering for a number
of years the means by which this should be brought about, and he
hit upon a happy thought in 1858.

This thought was the way of impregnating the minds of all the Irish
people with his ideas. He found that missions were most powerful
means of moving people in a body to reconciliation with God, and
an amendment of life. He perceived that the words of the
missionaries were treasured up, and that the advices they gave
were followed with a scrupulous exactness. Missions were the
moving power, but how were they to enter into all the corners of a
kingdom? Missions could only be given in large parishes, and all
priests did not set so high a value upon their importance as those
who asked for them. If he could concentrate the missionary power
into something less solemn, but of like efficacy, and succeed in
carrying that out, he thought it would be just the thing. This train
of deliberation resulted in the "little missions."
A "little mission" is a new mode of renewing fervour; Father
Ignatius was the originator and only worker in it of whom we have
any record. It was half a week of missionary work in every parish—
that is, three days and a half of preaching and hearing confessions.
Two sermons in the day were as much as ever Father Ignatius
gave, and the hours in the confessional were as many as he could
endure.

This kind of work had its difficulties. The whole course of subjects
proper to a mission could not be got through, neither could all the
penitents be heard. Father Ignatius met these objections. "The
eternal truths," as such, he did not introduce. He confined himself
to seven lectures, in which the crying evils, with their antidotes,
were introduced. As far as the confessions were concerned, he
followed the rule of moral theologians that a confessor is
responsible only for the penitent kneeling before him, and not for
those whose confession he has not begun. He heard all he could.

His routine of daily work on these little missions was to get up at

five, and hear confessions all day until midnight, except whilst
saying mass and office, giving his lecture and taking his meals. He
took no recreation whatever, and if he chatted any time after
dinner with the priest, the conversation might be considered a
continuation of his sermon. At a very moderate calculation he must
have spent at least twelve hours a day in the confessional. Some of
these apostolic visits he prolonged to a week when circumstances
required. He gave 245 of these missions from June, 1858, to
September, 1864; he was on his way to the 246th when he died. A
rough calculation will show us that he must have spent about
twenty-two weeks every year in this employment. Let us just think
of forty journeys, in cold and heat, from parish to parish,
sometimes on foot, sometimes on conveyances, which chance put
in his way. Let us follow him when he has strapped his bags upon
his shoulder, after his mass, walking off nine or ten miles, in order
to be in time to begin in another parish that evening. Let us see
the poor man trying to prevent his feeling pain from his sore feet
by walking a little faster, struggling, with umbrella broken, against
rain and wind, dust, a bad road, and a way unknown to add to his
difficulties. He arrives, he lays down his burden, puts on his habit,
takes some dinner, finishes his office, preaches his first discourse,
and sits in the confessional until half-past eleven o'clock. Let us try
to realize what this work must have been, and we shall have an
idea of the six last years of Father Ignatius Spencer's life.

We give a few extracts from his letters, as they will convey an idea
of how he felt and wrought in this great work.

On the 10th of August, 1858, he writes from the convent in Kells,

where he was helping the nuns through their retreat:—

"I have an hour and a half before my next sermon at 7; all the
nuns' confessions are finished, and all my office said; I have
therefore time for a letter. I have not had such an afternoon as
this for many months. The people of this town seem to think
the convent an impregnable fortress, and do not make an
assault upon me in it. If I was just to show myself in the church
I should be quickly surrounded. The reflections which come
upon me this quiet afternoon are not so bright and joyous as
you might expect, perhaps, from the tone of my letter to M
——, but rather of a heavy afflicting character; but all the
better, all the better. This is wholesome, and another stage in
my thoughts brings me to very great satisfaction out of this
heaviness. I do not know whether I shall explain myself to you.
I see myself here so alone, though the people come upon me
so eagerly, so warmly, and, I may say, so lovingly; yet I have
not one on whom I can think as sympathising with me. I see
the necessity of a complete radical change in the spirit of the
people, the necessity, I mean, in order to have some prospect
of giving the cause of truth its victory in England, and making
this Irish people permanently virtuous and happy. This is what I
am preaching from place to place, and aiming at instilling into
the people's minds in the confessional, at dinner-tables, in cars
on the road, as well as in preaching; and, while I aim at it, the
work is bright enough."

Oct. 11, 1860, he writes:

"I can hardly understand how I can go on for any long time
more as I am doing, and not find some capable and willing to
enter into them. Here I am through the 112th parish, with the
same proposals which no one objects to, but no one enters into
nor seems to understand."

May 6, 1861.—

"It seems my lot to be moving about as long as I can move. I

am very happy in the work I am about when I am at it, but I
have always to go through regret and sorrow before moving,
particularly when leaving my home. ... I have now gone through
132 parishes. No movement yet, such as I am aiming at. It
always goes on in the form of most interesting missionary work,
and is a most agreeable way of doing my begging work. I have
been through 123 of these parishes without asking a penny
from any one, but they bring me on an average more than £21
a parish in Ireland. I have worked through eleven parishes in
the diocese of Salford (England) out of that number, and these
do not yield half the fruit of the Irish missions in point of
money, but are otherwise very satisfactory.''

In a letter written in December of the same year:

"I am preparing for another year's work like the last, going from
parish to parish through Ireland, collecting for our Order, and at
the same time stirring the people to devote themselves to their
sanctification. They give their money very generously, they
listen kindly to my sermons, and I never have a minute idle in
hearing confessions; but hitherto there is no attention such as I
wish paid to my proposals. I have made these little missions
now in 160 parishes in Ireland, and to eleven Irish
congregations in England. I am, thank God, in as good plight as
ever I was in my life for this kind of work, and this seems to
give a hope that I may at length see the effect of it as I wish,
or the fruit may spring up when I am dead and buried. If death
comes upon me in this way, I will at least rejoice for myself that
I am dying more like our Lord than if I finished my course
crowned with the most brilliant successes; for when He died
people would say He had utterly failed, but He was just then
achieving His victory. Whatever way things take we cannot be
disappointed if we keep faithful to God."

The lovingness described as subsisting between himself and his

dear Irish people gave rise to many incidents, amongst which the
following is rather peculiar. At one place, where he had just
concluded a little mission, the people gathered round him when he
was about to go away. He heard many say, "What will we do when
he is gone?" and several other exclamations betokening their
affliction at having to part from him. He turned round and asked all
he saw to accompany him to the railway station. When they arrived
there he addressed them again in something like these words:
"Now, stand here until you see the train start, and when it is out of
sight, I want you all to say, 'Thank God, he is gone.'"

He met a great many refusals and cold receptions on these

missionary tours, but in general he was very well received. The
exceptions were dear to him, as they were profitable to himself,
and he seldom spoke of them unless there was some special lesson
they were calculated to convey.
CHAPTER XVII.
Father Ignatius At Home.

The work of the little missions kept Father Ignatius very much away
from the community. His visits at home were like meteor flashes,
bright and beautiful, and always made us regret we could not enjoy
his edifying company for a longer time. Those who are much away
on the external duties of the Order find the rule a little severe
when they return; to Father Ignatius it seemed a small heaven of
refreshing satisfaction. His coming home was usually announced to
the community a day or two before, and all were promising
themselves rare treats from his presence amongst them. It was
cheering to see the porter run in, beaming with joy, as he
announced the glad tidings, "Father Ignatius is come." The
exuberance of his own delight as he greeted, first one, and then
another of his companions, added to our own joy. In fact, the day
Father Ignatius came home almost became a holiday by custom.
Those days were; and we feel inclined to tire our readers by
expatiating on them, as if writing brought them back.

Whenever he arrived at one of our houses, and had a day or two

to stay, it was usual for the younger religious, such as novices and
students, to go to him, one by one, for conference. He liked this
very much, and would write to higher Superiors for permission to
turn off to Broadway, for instance, on his way to London, in order
to make acquaintance with the young religious. His counsels had
often a lasting effect; many who were inclined to leave the life they
had chosen remained steadfast, after a conference with him. He did
not give common-place solutions to difficulties, but he had some
peculiar phrase, some quaint axiom, some droll piece of spirituality,
to apply to every little trouble that came before him. He was
specially happy in his fund of anecdote, and could tell one, it was
believed, on any subject that came before him. This extraordinary
gift of conversational power made the Conferences delightful. The
novices, when they assembled in recreation, and gave their
opinions on Father Ignatius, whom many had spoken to for the first
time in their life, nearly all would conclude, "If there ever was a
saint, he's one."

It was amusing to observe how they prepared themselves for

forming their opinion. They all heard of his being a great saint, and
some fancied he would eat nothing at all for one day, and might
attempt a little vegetables on the next. One novice, in particular,
had made up his mind to this, and, to his great surprise, he saw
Father Ignatius eat an extra good breakfast; and, when about to
settle into a rash judgment, he saw the old man preparing to walk
seven miles to a railway station on the strength of his meal.
Another novice thought such a saint would never laugh nor make
anybody else laugh; to his agreeable disappointment, he found that
Father Ignatius brought more cheerfulness into the recreation than
had been there for some time.

In one thing Father Ignatius did not go against anticipation; he was

most exact in the observance of our rules. He would be always the
first in for the midnight office. Many a time the younger portion of
the community used to make arrangements overnight to be in
before him, but it was no use. Once, indeed, a student arrived in
choir before him, and Father Ignatius appeared so crestfallen at
being beaten that the student would never be in before him again,
and might delay on the way if he thought Father Ignatius had not
yet passed. He seemed particularly happy when he could light the
lamps or gas for matins. He was childlike in his obedience. He
would not transgress the most trifling regulation. It was usual with
him to say, "I cannot understand persons who say, 'Oh, I am all
right if I get to Purgatory.' We should be more generous with
Almighty God. I don't intend to go to Purgatory, and if I do, I must
know what for." "But, Father Ignatius," a father would say, "we fall
into so many imperfections that it seems presumption to attempt to
escape scot free." "Well," he would reply, "nothing can send us to
Purgatory but a wilful venial sin, and may the Lord preserve us
from such a thing as that; a religious ought to die before being
guilty of the least wilful fault." We saw from this that he could
scarcely imagine how a religious could do so, or, at least, that he
was very far from the like himself.

One time we were speaking about the Italian way of pronouncing

Latin, which we have adopted; he noticed some imperfections, and
one of the Italian Fathers present remarked a few points in which
Father Ignatius himself failed. One of them was, that he did not
pronounce the letter r strong enough; and another, that he did not
give a its full sound when it came in the middle of a word. For
some time it was observed that he made a most burring sound
when he pronounced an r, and went so far in correcting himself in
the other particular as to sin against prosody. Sometimes he would
forget little rubrics, but if any one told him of a mistake, he was
scarcely ever seen to commit it again.

Whenever he had half an hour to spare he wrote letters. We may

form an idea of his achievements in this point, when he tells us in
the Journal that on two days which remained free to him once he
wrote seventy-eight. A great number of his letters are preserved.
They are very entertaining and instructive; a nice vein of humour
runs through all those he wrote to his familiar friends.

These two letters may be looked upon as the extremes of the

sober and humorous style in his letter-writing:—

"When I used to call on you, you seemed to be tottering, as

one might say, on your last legs. Here you are, after so many
years, without having ever seen health or prosperity, and with
about as much life in you as then, to all appearance. All has
been, all is, and all will be, exactly as it pleases God. This is the
truth, the grand truth, I would almost say the whole and only
truth. There may be, and are, plenty of things besides, which
may be truly affirmed, yet this is the whole of what it concerns
us each to know. For if this is once well understood, of course it
follows that we have but one affair to attend to, that is, to
please God; because then, to a certainty, all the past, present,
and future will be found to be perfectly and absolutely ordered
for our own greatest good. If this one point be well studied, I
think we can steer people easily enough out of all low spirits
and melancholy. Many people can see the hand of God over
them in wonderful mercy in their past history, and so be
brought to a knowledge that their anxieties, and afflictions, and
groans, in those bygone days were unreasonable then. "Why do
they not learn to leave off groaning over the present troubles?
Because they do not trust God to manage anything right till
they have examined His work, and understood all about it. But
He, will be more honoured if we agree with Him, and approve
of what He does before we see what the good is which is to
come of it. In your case, if we go back to the days when I first
saw you at ——, when your father was in a good way of work,
and you were in health, there was the prospect then, I suppose,
before you of getting well settled in the world; and if all had
continued smooth and prosperous, you might now be a rich
merchant's wife in Birmingham, London, or New York, reckoned
the ornament of a large circle of wealthy friends, &c. But might
there not, perhaps, have been written over you as your motto?
Wo to you rich, for you have received your consolation.
Wo to you that laugh now for you shall mourn and
weep. You may be disposed to answer, you do not think you
would have been spoiled by prosperity. But if you are more or
less troubled or anxious at being in poverty, sickness, or
adversity, it shows that you would be, just in the same
measure, unable to bear prosperity and health unhurt. Wealth
and prosperity are dangerous to those only who love them and
trust in them. If, when you are in adversity, you are sorry for it,
and wish for prosperity, it shows love for this world's goods,
more or less. And if a person loves them when he has them
not, is it likely he would despise them if he had them? God
saves multitudes by poverty and afflictions in spite of
themselves. The same poverty and afflictions, if the persons
corresponded with God's providence and rejoiced in them,
would make them first-rate saints. The same may be said, with
as great truth, of interior afflictions, scruples, temptations,
darkness, dryness, and the rest of the catalogue of such
miseries. A person who is disquieted and anxious on account of
these, either does not understand that God's gifts are not God,
or if they do understand it, they love the gifts of God
independently of the giver. And so I add that such a one, if he
enjoyed uninterrupted peace and serenity of soul, would stop
very short indeed of the perfection of love to which God intends
to lead him if he will be docile. Now, as to your case, if you are
still alive and still serving God, and desiring to do so better and
better, it is clear that your afflictions, exterior and interior, have
not spoiled or ruined you. And as God loves our peace and
happiness, we may conclude that he would not have kept you
down and low, if it had not been necessary for your good. What
have you to do at last? Begin again to thank, praise, bless,
adore, and glorify God for all the tribulations, past and future,
and he may yet strengthen and preserve you to do abundance
of good, and lay up a great treasure in heaven."

The next letter is to a nun about a book which was supposed to be

lost:—

"The second perpetual calendar has been found. I had no

thought it would; but took my chance to ask, and somebody
had seen it, and it was looked for again and found. It has been
a clumsy bit of business on our part; but it ends right. It gives
another example of the wisdom of a certain young shepherdess
celebrated in the nursery in my early days—

"'Little Bopeep
Has lost her sheep,
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!