PDF Essentials of Compilation An Incremental Approach in Python 1st Edition Jeremy G. Siek download
PDF Essentials of Compilation An Incremental Approach in Python 1st Edition Jeremy G. Siek download
com
https://ebookmeta.com/product/essentials-of-compilation-an-
incremental-approach-in-python-1st-edition-jeremy-g-siek/
OR CLICK BUTTON
DOWNLOAD NOW
Jeremy G. Siek
The MIT Press would like to thank the anonymous peer reviewers who provided comments on
drafts of this book. The generous work of academic experts is essential for establishing the
authority and quality of our publications. We acknowledge with gratitude the contributions of
these otherwise uncredited readers.
This book was set in Times LT Std Roman by the author. Printed and bound in the United
States of America.
10 9 8 7 6 5 4 3 2 1
This book is dedicated to Katie, my partner in everything, my children, who grew
up during the writing of this book, and the programming language students at
Indiana University, whose thoughtful questions made this a better book.
Contents
Preface xi
1 Preliminaries 1
1.1 Abstract Syntax Trees 1
1.2 Grammars 3
1.3 Pattern Matching 5
1.4 Recursive Functions 6
1.5 Interpreters 8
1.6 Example Compiler: A Partial Evaluator 10
3 Parsing 29
3.1 Lexical Analysis and Regular Expressions 29
3.2 Grammars and Parse Trees 31
3.3 Ambiguous Grammars 33
3.4 From Parse Trees to Abstract Syntax Trees 34
3.5 Earley’s Algorithm 36
3.6 The LALR(1) Algorithm 40
3.7 Further Reading 43
4 Register Allocation 45
4.1 Registers and Calling Conventions 46
4.2 Liveness Analysis 49
4.3 Build the Interference Graph 51
viii Contents
8 Functions 125
8.1 The LFun Language 125
8.2 Functions in x86 130
8.3 Shrink LFun 133
8.4 Reveal Functions and the LFunRef Language 133
Contents ix
12 Generics 195
12.1 Compiling Generics 201
12.2 Resolve Instantiation 202
12.3 Erase Generic Types 202
A Appendix 207
A.1 x86 Instruction Set Quick Reference 207
References 209
Index 217
Preface
There is a magical moment when a programmer presses the run button and the
software begins to execute. Somehow a program written in a high-level language is
running on a computer that is capable only of shuffling bits. Here we reveal the wiz-
ardry that makes that moment possible. Beginning with the groundbreaking work
of Backus and colleagues in the 1950s, computer scientists developed techniques
for constructing programs called compilers that automatically translate high-level
programs into machine code.
We take you on a journey through constructing your own compiler for a small
but powerful language. Along the way we explain the essential concepts, algorithms,
and data structures that underlie compilers. We develop your understanding of how
programs are mapped onto computer hardware, which is helpful in reasoning about
properties at the junction of hardware and software, such as execution time, soft-
ware errors, and security vulnerabilities. For those interested in pursuing compiler
construction as a career, our goal is to provide a stepping-stone to advanced topics
such as just-in-time compilation, program analysis, and program optimization. For
those interested in designing and implementing programming languages, we connect
language design choices to their impact on the compiler and the generated code.
A compiler is typically organized as a sequence of stages that progressively trans-
late a program to the code that runs on hardware. We take this approach to the
extreme by partitioning our compiler into a large number of nanopasses, each of
which performs a single task. This enables the testing of each pass in isolation and
focuses our attention, making the compiler far easier to understand.
The most familiar approach to describing compilers is to dedicate each chapter
to one pass. The problem with that approach is that it obfuscates how language
features motivate design choices in a compiler. We instead take an incremental
approach in which we build a complete compiler in each chapter, starting with
a small input language that includes only arithmetic and variables. We add new
language features in subsequent chapters, extending the compiler as necessary.
Our choice of language features is designed to elicit fundamental concepts and
algorithms used in compilers.
• We begin with integer arithmetic and local variables in chapters 1 and 2, where
we introduce the fundamental tools of compiler construction: abstract syntax trees
and recursive functions.
xii Preface
• In chapter 3 we learn how to use the Lark parser framework to create a parser
for the language of integer arithmetic and local variables. We learn about the
parsing algorithms inside Lark, including Earley and LALR(1).
• In chapter 4 we apply graph coloring to assign variables to machine registers.
• Chapter 5 adds conditional expressions, which motivates an elegant recursive
algorithm for translating them into conditional goto statements.
• Chapter 6 adds loops. This elicits the need for dataflow analysis in the register
allocator.
• Chapter 7 adds heap-allocated tuples, motivating garbage collection.
• Chapter 8 adds functions as first-class values without lexical scoping, similar to
functions in the C programming language (Kernighan and Ritchie 1988). The
reader learns about the procedure call stack and calling conventions and how
they interact with register allocation and garbage collection. The chapter also
describes how to generate efficient tail calls.
• Chapter 9 adds anonymous functions with lexical scoping, that is, lambda
expressions. The reader learns about closure conversion, in which lambdas are
translated into a combination of functions and tuples.
• Chapter 10 adds dynamic typing. Prior to this point the input languages are
statically typed. The reader extends the statically typed language with an Any
type that serves as a target for compiling the dynamically typed language.
• Chapter 11 uses the Any type introduced in chapter 10 to implement a gradually
typed language in which different regions of a program may be static or dynami-
cally typed. The reader implements runtime support for proxies that allow values
to safely move between regions.
• Chapter 12 adds generics with autoboxing, leveraging the Any type and type
casts developed in chapters 10 and 11.
There are many language features that we do not include. Our choices balance the
incidental complexity of a feature versus the fundamental concepts that it exposes.
For example, we include tuples and not records because although they both elicit the
study of heap allocation and garbage collection, records come with more incidental
complexity.
Since 2009, drafts of this book have served as the textbook for sixteen-week
compiler courses for upper-level undergraduates and first-year graduate students at
the University of Colorado and Indiana University. Students come into the course
having learned the basics of programming, data structures and algorithms, and
discrete mathematics. At the beginning of the course, students form groups of two
to four people. The groups complete approximately one chapter every two weeks,
starting with chapter 2 and including chapters according to the students interests
while respecting the dependencies between chapters shown in figure 0.1. Chapter 8
(functions) depends on chapter 7 (tuples) only in the implementation of efficient
tail calls. The last two weeks of the course involve a final project in which students
design and implement a compiler extension of their choosing. The last few chapters
can be used in support of these projects. Many chapters include a challenge problem
that we assign to the graduate students.
Preface xiii
Figure 0.1
Diagram of chapter dependencies.
For compiler courses at universities on the quarter system (about ten weeks in
length), we recommend completing the course through chapter 7 or chapter 8 and
providing some scaffolding code to the students for each compiler pass. The course
can be adapted to emphasize functional languages by skipping chapter 6 (loops)
and including chapter 9 (lambda). The course can be adapted to dynamically typed
languages by including chapter 10.
This book has been used in compiler courses at California Polytechnic State Uni-
versity, Portland State University, Rose–Hulman Institute of Technology, University
of Freiburg, University of Massachusetts Lowell, and the University of Vermont.
This edition of the book uses Python both for the implementation of the compiler
and for the input language, so the reader should be proficient with Python. There
are many excellent resources for learning Python (Lutz 2013; Barry 2016; Sweigart
2019; Matthes 2019).The support code for this book is in the GitHub repository at
the following location:
https://github.com/IUCompilerCourse/
The compiler targets x86 assembly language (Intel 2015), so it is helpful but
not necessary for the reader to have taken a computer systems course (Bryant
and O’Hallaron 2010). We introduce the parts of x86-64 assembly language that
are needed in the compiler. We follow the System V calling conventions (Bryant
and O’Hallaron 2005; Matz et al. 2013), so the assembly code that we gener-
ate works with the runtime system (written in C) when it is compiled using the
GNU C compiler (gcc) on Linux and MacOS operating systems on Intel hardware.
On the Windows operating system, gcc uses the Microsoft x64 calling conven-
tion (Microsoft 2018, 2020). So the assembly code that we generate does not work
with the runtime system on Windows. One workaround is to use a virtual machine
with Linux as the guest operating system.
xiv Preface
Acknowledgments
Jeremy G. Siek
Bloomington, Indiana
Preliminaries
1
In this chapter we introduce the basic tools needed to implement a compiler. Pro-
grams are typically input by a programmer as text, that is, a sequence of characters.
The program-as-text representation is called concrete syntax. We use concrete syn-
tax to concisely write down and talk about programs. Inside the compiler, we use
abstract syntax trees (ASTs) to represent programs in a way that efficiently sup-
ports the operations that the compiler needs to perform. The process of translating
concrete syntax to abstract syntax is called parsing and is studied in chapter 3. For
now we use the parse function in Python’s ast module to translate from concrete
to abstract syntax.
ASTs can be represented inside the compiler in many different ways, depending
on the programming language used to write the compiler. We use Python classes
and objects to represent ASTs, especially the classes defined in the standard ast
module for the Python source language. We use grammars to define the abstract
syntax of programming languages (section 1.2) and pattern matching to inspect
individual nodes in an AST (section 1.3). We use recursive functions to construct
and deconstruct ASTs (section 1.4). This chapter provides a brief introduction to
these components.
Compilers use abstract syntax trees to represent programs because they often need
to ask questions such as, for a given part of a program, what kind of language feature
is it? What are its subparts? Consider the program on the left and the diagram
of its AST on the right (1.1). This program is an addition operation that has two
subparts, a input operation and a negation. The negation has another subpart, the
integer constant 8. By using a tree to represent the program, we can easily follow
the links to go from one part of a program to its subparts.
input_int() + -8 input_int() -
8 (1.1)
Discovering Diverse Content Through
Random Scribd Documents
ennen on sanottu, eikä se välitä vuorista, vaikka niitä kiemurtelisi
viiniköynnös, mutta annahan olla, kun välähtää järvi näkyviin, niin se
sanoo toverilleen, että:
*****
IKUINEN KAUPUNKI.