Programming Languages and Systems Ilya Sergey download
Programming Languages and Systems Ilya Sergey download
install download
https://ebookmeta.com/product/programming-languages-and-systems-
ilya-sergey/
https://ebookmeta.com/product/programming-languages-application-
and-interpretation-printing-shriram-krishnamurthi/
https://ebookmeta.com/product/programming-languages-build-prove-
and-compare-norman-ramsey/
https://ebookmeta.com/product/programming-languages-principles-
and-paradigms-2nd-edition-maurizio-gabbrielli/
https://ebookmeta.com/product/cyber-crime-investigators-field-
guide-3rd-edition-bruce-middleton/
Engaging with Brecht. Making Theatre in the Twenty-
first Century 1st Edition Bill Gelber
https://ebookmeta.com/product/engaging-with-brecht-making-
theatre-in-the-twenty-first-century-1st-edition-bill-gelber/
https://ebookmeta.com/product/mothership-haunting-of-
ypsilon-4-1st-edition-sean-mccay/
https://ebookmeta.com/product/ai-powered-business-
intelligence-1st-edition-tobias-zwingmann/
https://ebookmeta.com/product/handmade-soap-book-easy-soapmaking-
with-natural-ingredients-2nd-edition-melinda-coss/
https://ebookmeta.com/product/the-black-elfstone-book-one-of-the-
fall-of-shannara-1st-edition-terry-brooks/
Making Faithful Decisions at the End of Life 3rd
Edition Nancy J Duff
https://ebookmeta.com/product/making-faithful-decisions-at-the-
end-of-life-3rd-edition-nancy-j-duff/
ARCoSS Ilya Sergey (Ed.)
Programming
LNCS 13240
Languages
and Systems
31st European Symposium on Programming, ESOP 2022
Held as Part of the European Joint Conferences
on Theory and Practice of Software, ETAPS 2022
Munich, Germany, April 2–7, 2022
Proceedings
Lecture Notes in Computer Science 13240
Founding Editors
Gerhard Goos, Germany
Juris Hartmanis, USA
Programming
Languages
and Systems
31st European Symposium on Programming, ESOP 2022
Held as Part of the European Joint Conferences
on Theory and Practice of Software, ETAPS 2022
Munich, Germany, April 2–7, 2022
Proceedings
123
Editor
Ilya Sergey
National University of Singapore
Singapore, Singapore
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
ETAPS Foreword
Welcome to the 25th ETAPS! ETAPS 2022 took place in Munich, the beautiful capital
of Bavaria, in Germany.
ETAPS 2022 is the 25th instance of the European Joint Conferences on Theory and
Practice of Software. ETAPS is an annual federated conference established in 1998,
and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each
conference has its own Program Committee (PC) and its own Steering Committee
(SC). The conferences cover various aspects of software systems, ranging from theo-
retical computer science to foundations of programming languages, analysis tools, and
formal approaches to software engineering. Organizing these conferences in a coherent,
highly synchronized conference program enables researchers to participate in an
exciting event, having the possibility to meet many colleagues working in different
directions in the field, and to easily attend talks of different conferences. On the
weekend before the main conference, numerous satellite workshops took place that
attract many researchers from all over the globe.
ETAPS 2022 received 362 submissions in total, 111 of which were accepted,
yielding an overall acceptance rate of 30.7%. I thank all the authors for their interest in
ETAPS, all the reviewers for their reviewing efforts, the PC members for their con-
tributions, and in particular the PC (co-)chairs for their hard work in running this entire
intensive process. Last but not least, my congratulations to all authors of the accepted
papers!
ETAPS 2022 featured the unifying invited speakers Alexandra Silva (University
College London, UK, and Cornell University, USA) and Tomáš Vojnar (Brno
University of Technology, Czech Republic) and the conference-specific invited
speakers Nathalie Bertrand (Inria Rennes, France) for FoSSaCS and Lenore Zuck
(University of Illinois at Chicago, USA) for TACAS. Invited tutorials were provided by
Stacey Jeffery (CWI and QuSoft, The Netherlands) on quantum computing and
Nicholas Lane (University of Cambridge and Samsung AI Lab, UK) on federated
learning.
As this event was the 25th edition of ETAPS, part of the program was a special
celebration where we looked back on the achievements of ETAPS and its constituting
conferences in the past, but we also looked into the future, and discussed the challenges
ahead for research in software science. This edition also reinstated the ETAPS men-
toring workshop for PhD students.
ETAPS 2022 took place in Munich, Germany, and was organized jointly by the
Technical University of Munich (TUM) and the LMU Munich. The former was
founded in 1868, and the latter in 1472 as the 6th oldest German university still running
today. Together, they have 100,000 enrolled students, regularly rank among the top
100 universities worldwide (with TUM’s computer-science department ranked #1 in
the European Union), and their researchers and alumni include 60 Nobel laureates.
vi ETAPS Foreword
The local organization team consisted of Jan Křetínský (general chair), Dirk Beyer
(general, financial, and workshop chair), Julia Eisentraut (organization chair), and
Alexandros Evangelidis (local proceedings chair).
ETAPS 2022 was further supported by the following associations and societies:
ETAPS e.V., EATCS (European Association for Theoretical Computer Science),
EAPLS (European Association for Programming Languages and Systems), and EASST
(European Association of Software Science and Technology).
The ETAPS Steering Committee consists of an Executive Board, and representa-
tives of the individual ETAPS conferences, as well as representatives of EATCS,
EAPLS, and EASST. The Executive Board consists of Holger Hermanns
(Saarbrücken), Marieke Huisman (Twente, chair), Jan Kofroň (Prague), Barbara König
(Duisburg), Thomas Noll (Aachen), Caterina Urban (Paris), Tarmo Uustalu (Reykjavik
and Tallinn), and Lenore Zuck (Chicago).
Other members of the Steering Committee are Patricia Bouyer (Paris), Einar Broch
Johnsen (Oslo), Dana Fisman (Be’er Sheva), Reiko Heckel (Leicester), Joost-Pieter
Katoen (Aachen and Twente), Fabrice Kordon (Paris), Jan Křetínský (Munich), Orna
Kupferman (Jerusalem), Leen Lambers (Cottbus), Tiziana Margaria (Limerick),
Andrew M. Pitts (Cambridge), Elizabeth Polgreen (Edinburgh), Grigore Roşu (Illinois),
Peter Ryan (Luxembourg), Sriram Sankaranarayanan (Boulder), Don Sannella
(Edinburgh), Lutz Schröder (Erlangen), Ilya Sergey (Singapore), Natasha Sharygina
(Lugano), Pawel Sobocinski (Tallinn), Peter Thiemann (Freiburg), Sebastián Uchitel
(London and Buenos Aires), Jan Vitek (Prague), Andrzej Wasowski (Copenhagen),
Thomas Wies (New York), Anton Wijs (Eindhoven), and Manuel Wimmer (Linz).
I’d like to take this opportunity to thank all authors, attendees, organizers of the
satellite workshops, and Springer-Verlag GmbH for their support. I hope you all
enjoyed ETAPS 2022.
Finally, a big thanks to Jan, Julia, Dirk, and their local organization team for all their
enormous efforts to make ETAPS a fantastic event.
This volume contains the papers accepted at the 31st European Symposium on
Programming (ESOP 2022), held during April 5–7, 2022, in Munich, Germany
(COVID-19 permitting). ESOP is one of the European Joint Conferences on Theory
and Practice of Software (ETAPS); it is dedicated to fundamental issues in the spec-
ification, design, analysis, and implementation of programming languages and systems.
The 21 papers in this volume were selected by the Program Committee (PC) from
64 submissions. Each submission received between three and four reviews. After
receiving the initial reviews, the authors had a chance to respond to questions and
clarify misunderstandings of the reviewers. After the author response period, the papers
were discussed electronically using the HotCRP system by the 33 Program Committee
members and 33 external reviewers. Two papers, for which the PC chair had a conflict
of interest, were kindly managed by Zena Ariola. The reviewing for ESOP 2022 was
double-anonymous, and only authors of the eventually accepted papers have been
revealed.
Following the example set by other major conferences in programming languages,
for the first time in its history, ESOP featured optional artifact evaluation. Authors
of the accepted manuscripts were invited to submit artifacts, such as code, datasets, and
mechanized proofs, that supported the conclusions of their papers. Members of the
Artifact Evaluation Committee (AEC) read the papers and explored the artifacts,
assessing their quality and checking that they supported the authors’ claims. The
authors of eleven of the accepted papers submitted artifacts, which were evaluated by
20 AEC members, with each artifact receiving four reviews. Authors of papers with
accepted artifacts were assigned official EAPLS artifact evaluation badges, indicating
that they have taken the extra time and have undergone the extra scrutiny to prepare a
useful artifact. The ESOP 2022 AEC awarded Artifacts Functional and Artifacts
(Functional and) Reusable badges. All submitted artifacts were deemed Functional, and
all but one were found to be Reusable.
My sincere thanks go to all who contributed to the success of the conference and to
its exciting program. This includes the authors who submitted papers for consideration;
the external reviewers who provided timely expert reviews sometimes on very short
notice; the AEC members and chairs who took great care of this new aspect of ESOP;
and, of course, the members of the ESOP 2022 Program Committee. I was extremely
impressed by the excellent quality of the reviews, the amount of constructive feedback
given to the authors, and the criticism delivered in a professional and friendly tone.
I am very grateful to Andreea Costea and KC Sivaramakrishnan who kindly agreed to
serve as co-chairs for the ESOP 2022 Artifact Evaluation Committee. I would like to
thank the ESOP 2021 chair Nobuko Yoshida for her advice, patience, and the many
insightful discussions on the process of running the conference. I thank all who con-
tributed to the organization of ESOP: the ESOP steering committee and its chair Peter
Thiemann, as well as the ETAPS steering committee and its chair Marieke Huisman.
viii Preface
Finally, I would like to thank Barbara König and Alexandros Evangelidis for their help
with assembling the proceedings.
Program Chair
Ilya Sergey National University of Singapore, Singapore
Program Committee
Michael D. Adams Yale-NUS College, Singapore
Danel Ahman University of Ljubljana, Slovenia
Aws Albarghouthi University of Wisconsin-Madison, USA
Zena M. Ariola University of Oregon, USA
Ahmed Bouajjani Université de Paris, France
Giuseppe Castagna CNRS, Université de Paris, France
Cristina David University of Bristol, UK
Mariangiola Dezani Università di Torino, Italy
Rayna Dimitrova CISPA Helmholtz Center for Information Security,
Germany
Jana Dunfield Queen’s University, Canada
Aquinas Hobor University College London, UK
Guilhem Jaber Université de Nantes, France
Jeehoon Kang KAIST, South Korea
Ekaterina Komendantskaya Heriot-Watt University, UK
Ori Lahav Tel Aviv University, Israel
Ivan Lanese Università di Bologna, Italy, and Inria, France
Dan Licata Wesleyan University, USA
Sam Lindley University of Edinburgh, UK
Andreas Lochbihler Digital Asset, Switzerland
Cristina Lopes University of California, Irvine, USA
P. Madhusudan University of Illinois at Urbana-Champaign, USA
Stefan Marr University of Kent, UK
James Noble Victoria University of Wellington, New Zealand
Burcu Kulahcioglu Ozkan Delft University of Technology, The Netherlands
Andreas Pavlogiannis Aarhus University, Denmark
Vincent Rahli University of Birmingham, UK
Robert Rand University of Chicago, USA
Christine Rizkallah University of Melbourne, Australia
Alejandro Russo Chalmers University of Technology, Sweden
Gagandeep Singh University of Illinois at Urbana-Champaign, USA
Gordon Stewart BedRock Systems, USA
Joseph Tassarotti Boston College, USA
Bernardo Toninho Universidade NOVA de Lisboa, Portugal
x Organization
Additional Reviewers
Andreas Abel Gothenburg University, Sweden
Guillaume Allais University of St Andrews, UK
Kalev Alpernas Tel Aviv University, Israel
Davide Ancona Università di Genova, Italy
Stephanie Balzer Carnegie Mellon University, USA
Giovanni Bernardi Université de Paris, France
Soham Chakraborty Delft University of Technology, The Netherlands
Arthur Chargueraud Inria, France
Ranald Clouston Australian National University, Australia
Fredrik Dahlqvist University College London, UK
Olivier Danvy Yale-NUS College, Singapore
Benjamin Delaware Purdue University, USA
Dominique Devriese KU Leuven, Belgium
Paul Downen University of Massachusetts, Lowell, USA
Yannick Forster Saarland University, Germany
Milad K. Ghale University of New South Wales, Australia
Kiran Gopinathan National University of Singapore, Singapore
Tristan Knoth University of California, San Diego, USA
Paul Levy University of Birmingham, UK
Umang Mathur National University of Singapore, Singapore
McKenna McCall Carnegie Mellon University, USA
Garrett Morris University of Iowa, USA
Fredrik Nordvall Forsberg University of Strathclyde, UK
José N. Oliveira University of Minho, Portugal
Alex Potanin Australian National University, Australia
Susmit Sarkar University of St Andrews, UK
Filip Sieczkowski Heriot-Watt University, UK
Kartik Singhal University of Chicago, USA
Sandro Stucki Chalmers University of Technology and University
of Gothenburg, Sweden
Amin Timany Aarhus University, Denmark
Klaus v. Gleissenthall Vrije Universiteit Amsterdam, The Netherlands
Thomas Wies New York University, USA
Vladimir Zamdzhiev Inria, Loria, Université de Lorraine, France
1 Introduction
The last decade has witnessed a surge of interest in machine learning, fuelled by
the numerous successes and applications that these methodologies have found in
many fields of science and technology. As machine learning techniques become
increasingly pervasive, algorithms and models become more sophisticated, posing
a significant challenge both to the software developers and the users that need to
interface, execute and maintain these systems. In spite of this rapidly evolving
picture, the formal analysis of many learning algorithms mostly takes place at a
heuristic level [41], or using definitions that fail to provide a general and scalable
framework for describing machine learning. Indeed, it is commonly acknowledged
through academia, industry, policy makers and funding agencies that there is a
pressing need for a unifying perspective, which can make this growing body of
work more systematic, rigorous, transparent and accessible both for users and
developers [2, 36].
Consider, for example, one of the most common machine learning scenar-
ios: supervised learning with a neural network. This technique trains the model
towards a certain task, e.g. the recognition of patterns in a data set (cf. Fig-
ure 1). There are several different ways of implementing this scenario. Typically,
at their core, there is a gradient update algorithm (often called the “optimiser”),
depending on a given loss function, which updates in steps the parameters of the
network, based on some learning rate controlling the “scaling” of the update. All
c The Author(s) 2022
I. Sergey (Ed.): ESOP 2022, LNCS 13240, pp. 1–28, 2022.
https://doi.org/10.1007/978-3-030-99336-8_1
2 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi
that f (p, −) is the best function according to some criteria. Specifically, the
weights on the internal nodes of a neural network are a parameter which the
learning is seeking to optimize. Parameters also arise elsewhere, e.g. in the
loss function (see later).
(II) information flows bidirectionally: in the forward direction, the computa-
tion turns inputs via a sequence of layers into predicted outputs, and then
into a loss value; in the reverse direction, backpropagation is used propa-
gate the changes backwards through the layers, and then turn them into
parameter updates.
(III) the basis of parameter update via gradient descent is differentiation e.g.
in the simple case we differentiate the function mapping a parameter to its
associated loss to reduce that loss.
We model bidirectionality via lenses [6, 12, 29] and based upon the above
three insights, we propose the notion of parametric lens as the fundamental
semantic structure of learning. In a nutshell, a parametric lens is a process with
three kinds of interfaces: inputs, outputs, and parameters. On each interface,
information flows both ways, i.e. computations are bidirectional. These data
are best explained with our graphical representation of parametric lenses, with
inputs A, A′ , outputs B, B ′ , parameters P , P ′ , and arrows indicating information
flow (below left). The graphical notation also makes evident that parametric
lenses are open systems, which may be composed along their interfaces (below
center and right).
Q Q′
′ ′ Q Q ′
P P P P
B
A B A C
(1)
A′ B′ A′ C′ P P′
B′
A B
A′ B′
This pictorial formalism is not just an intuitive sketch: as we will show, it can
be understood as a completely formal (graphical) syntax using the formalism of
string diagrams [39], in a way similar to how other computational phenomena
have been recently analysed e.g. in quantum theory [14], control theory [5, 8],
and digital circuit theory [26].
It is intuitively clear how parametric lenses express aspects (I) and (II) above,
whereas (III) will be achieved by studying them in a space of ‘differentiable
objects’ (in a sense that will be made precise). The main technical contribution
of our paper is showing how the various ingredients involved in learning (the
model, the optimiser, the error map and the learning rate) can be uniformly
understood as being built from parametric lenses.
We will use category theory as the formal language to develop our notion of
parametric lenses, and make Figure 2 mathematically precise. The categorical
perspective brings several advantages, which are well-known, established princi-
ples in programming language semantics [3,40,49]. Three of them are particularly
4 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi
A P P
B
Optimiser
P P′
B′
A B L
Learning
Model Loss
rate
A′ B′ L′
Fig. 2: The parametric lens that captures the learning process informally sketched
in Figure 1. Note each component is a lens itself, whose composition yields the
interactions described in Figure 1. Defining this picture formally will be the
subject of Sections 3-4.
2 Categorical Toolkit
Example 1. Take the category Smooth whose objects are natural numbers and
whose morphisms f : n → m are smooth maps from Rn to Rm . As described
above, the category Para(Smooth) can be thought of as a category of neural
networks: a map in this category from n to m consists of a choice of p and a
map f : Rp × Rn → Rm with Rp representing the set of possible weights of the
neural network.
As we will see in the next sections, the interplay of the various components
at work in the learning process becomes much clearer once represented the mor-
phisms of Para(C) using the pictorial formalism of string diagrams, which we
now recall. In fact, we will mildly massage the traditional notation for string
diagrams (below left), by representing a morphism f : A → B in Para(C) as
below right.
P
P
f B A f B
A
This is to emphasise the special role played by P , reflecting the fact that in
machine learning data and parameters have different semantics. String diagram-
matic notations also allows to neatly represent composition of maps (P, f ) : A →
B and (P ′ , f ′ ) : B → C (below left), and “reparameterisation” of (P, f ) : A → B
by a map α : Q → P (below right), yielding a new map (Q, (α⊗1A ); f ) : A → B.
P P′ α
(2)
P
B
A f f′ C A f B
4
One can also define Para(C) in the case when C is non-strict; however, the result
would be not a category but a bicategory.
8 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi
2.2 Lenses
f B
A
A ∗ B
(f, f )
A′ B′
A′ f∗
B′
f B g C
A
(3)
′ ∗ ∗
A f g
B′ C′
Categorical Foundations of Gradient-Based Learning 9
The fundamental category where supervised learning takes place is the composite
Para(Lens(C)) of the two constructions in the previous sections:
For f : A → B, the pair (f, R[f ]) forms a lens from (A, A) to (B, B). We
will pursue the idea that R[f ] acts as backwards map, thus giving a means to
“learn”f .
5
In [23], these are called learners. However, in this paper we study them in a much
broader light; see Section 6.
10 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi
Note that assigning type A×B → A to R[f ] hides some relevant information:
B-values in the domain and A-values in the codomain of R[f ] do not play the
same role as values of the same types in f : A → B: in R[f ], they really take in a
tangent vector at B and output a tangent vector at A (cf. the definition of R[f ]
in Smooth, Example 2 below). To emphasise this, we will type R[f ] as a map
A × B ′ → A′ (even though in reality A = A′ and B = B ′ ), thus meaning that
(f, R[f ]) is actually a lens from (A, A′ ) to (B, B ′ ). This typing distinction will
be helpful later on, when we want to add additional components to our learning
algorithms.
The following two examples of CRDCs will serve as the basis for the learning
scenarios of the upcoming sections.
P P′
P f B
A
A f B 7→ (4)
A′ R[f ]
B′
parametric map from B to R with parameter space B.6 We also generalize the
codomain to an arbitrary object L.
Note that we can precompose a loss map (B, loss) : B → L with a neural
network (P, f ) : A → B (below left), and apply the functor in (4) (with C =
Smooth) to obtain the parametric lens below right.
P P′ B B′
P B B
A f loss L (5)
B 7→
A f loss L A′ R[f ] R[loss] L′
B′
This is getting closer to the parametric lens we want: it can now receive
inputs of type B. However, this is at the cost of now needing an input to L′ ; we
consider how to handle this in the next section.
Example 9 (Dot product). In Deep Dreaming (Section 4.2) we often want to focus
only on a particular element of the network output Rb . This is done by supplying
a one-hot vector bt as the ground truth to the loss function e(bt , bp ) = bt ·bp which
computes the dot product of two vectors. If the ground truth vector y is a one-
hot vector (active at the i-th element), then the dot product performs masking of
all inputs except the i-th one. Note the reverse derivative R[e] : Rb × Rb × R →
Rb × Rb of the dot product is defined as R[e](bt , bp , α) = (α · bp , α · bt ).
B L
A f loss
α (6)
A′ R[f ] R[loss]
B′ L′
Example 10. In standard supervised learning in Smooth, one fixes some ϵ > 0
as a learning rate, and this is used to define α: α is simply constantly −ϵ, ie.,
α(l) = −ϵ for any l ∈ L.
Example 11. In supervised learning in POLY Z2 , the standard learning rate is
quite different: for a given L it is defined as the identity function, α(l) = l.
Other learning rate morphisms are possible as well: for example, one could
fix some ϵ > 0 and define a learning rate in Smooth by α(l) = −ϵ · l. Such a
choice would take into account how far away the network is from its desired goal
and adjust the learning rate accordingly.
Intuitively, such a lens allows one to receive the requested change in parameter
and implement that change by adding that value to the current parameter. By its
type, we can now “plug” the gradient descent lens G : (P, P ) → (P, P ′ ) above the
model (f, R[f ]) in (4) — formally, this is accomplished as a reparameterisation
of the parametric morphism (f, R[f ]), cf. Section 2.1. This gives us Figure 3
(left).
P P S×P S×P
+ Optimiser
P P′ P P′
A B A B
Model Model
A′ B′ A′ B′
Other variants of gradient descent also fit naturally into this framework by
allowing for additional input/output data with P . In particular, many of them
keep track of the history of previous updates and use that to inform the next one.
This is easy to model in our setup: instead of asking for a lens (P, P ) → (P, P ′ ),
we ask instead for a lens (S ×P, S ×P ) → (P, P ′ ) where S is some “state” object.
7
Note that as in the discussion in Section 2.4, we are implicitly assuming that P = P ′ ;
we have merely notated them differently to emphasize the different “roles” they play
(the first P can be thought of as “points”, the second as “vectors”)
Categorical Foundations of Gradient-Based Learning 15
A A′ P P′ B B′
A B L
Model Loss α (7)
A′ B′ L′
This composite is now a map in Para(Lens(C)) from (1, 1) to (1, 1); all its inputs
and outputs are now vertical wires, ie., parameters. Unpacking it further, this is
a lens of type (A × P × B, A′ × P ′ × B ′ ) → (1, 1) whose get map is the terminal
map, and whose put map is of the type A × P × B → A′ × P ′ × B ′ . It can be
unpacked as the composite put(a, p, bt ) = (a′ , p′ , b′t ), where
bp = f (p, a) (b′t , b′p ) = R[loss](bt , bp , α(loss(bt , bp ))) (p′ , a′ ) = R[f ](p, a, b′p ).
In the next two sections we consider further additions to the image above which
correspond to different types of supervised learning.
A S×P S×P
B
Optimiser
P P′
B′
A B L
Model Loss α
′ ′ ′
A B L
p = U (s, p) bp = f (p, a)
(b′t , b′p ) = R[loss](bt , bp , α(loss(bt , bp ))) (p′ , a′ ) = R[f ](p, a, b′p ).
While this formulation might seem daunting, we note that it just explicitly
specifies the computation performed by a supervised learning system. The vari-
able p represents the parameter supplied to the network by the stateful gradient
update rule (in many cases this is equal to p); bp represents the prediction of
the network (contrast this with bt which represents the ground truth from the
dataset). Variables with a tick ′ represent changes: b′p and b′t are the changes
on predictions and true values respectively, while p′ and a′ are changes on the
parameters and inputs. Furthermore, this arises automatically out of the rule for
lens composition (3); what we needed to specify is just the lenses themselves.
We justify and illustrate our approach on a series of case studies drawn from
the literature. This presentation has the advantage of treating all these instances
uniformly in terms of basic constructs, highlighting their similarities and differ-
ences. First, we fix some parametric map (Rp , f ) : Para(Smooth)(Ra , Rb ) in
Smooth and the constant negative learning rate α : R (Example 10). We then
vary the loss function and the gradient update, seeing how the put map above
reduces to many of the known cases in the literature.
Example 18 (Quadratic error, basic gradient descent). Fix the quadratic error
(Example 6) as the loss map and basic gradient update (Example 12). Then the
aforementioned put map simplifies. Since there is no state, its type reduces to
A × P × B → P , and we have put(a, p, bt ) = p + p′ , where (p′ , a′ ) = R[f ](p, a, α ·
(f (p, a) − bt )). Note that α here is simply a constant, and due to the linearity
of the reverse derivative (Def 4), we can slide the α from the costate into the
basic gradient update lens. Rewriting this update, and performing this sliding we
obtain a closed form update step put(a, p, bt ) = p+α·(R[f ](p, a, f (p, a)−bt ); π0 ),
18 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi
Example 19 (Softmax cross entropy, basic gradient descent). Fix Softmax cross
entropy (Example 8) as the loss map and basic gradient update (Example 12).
Again the put map simplifies. The type reduces to A × P × B → P and we have
put(a, p, bt ) = p + p′ where (p′ , a′ ) = R[f ](p, a, α · (Softmax(f (p, a)) − bt )). The
same rewriting performed on the previous example can be done here.
Example 20 (Mean squared error, Nesterov Momentum). Fix the quadratic error
(Example 6) as the loss map and Nesterov momentum (Example 15) as the
gradient update. This time the put map A × S × P × B → S × P does not have a
simplified type. The implementation of put reduces to put(a, s, p, bt ) = (s′ , p+s′ ),
where p = p + γs, (p′ , a′ ) = R[f ](p, a, α · (f (p, a) − bt )), and s′ = −γs + p′ .
This example with Nesterov momentum differs in two key points from all
the other ones: i) the optimiser is stateful, and ii) its get map is not trivial.
While many other optimisers are stateful, the non-triviality of the get map here
showcases the importance of lenses. They allow us to make precise the notion of
computing a “lookahead” value for Nesterov momentum, something that is in
practice usually handled in ad-hoc ways. Here, the algebra of lens composition
handles this case naturally by using the get map, a seemingly trivial, unused
piece of data for previous optimisers.
Our last example, using a different base category POLY Z2 , shows that our
framework captures learning in not just continuous, but discrete settings too.
Again, we fix a parametric map (Zp , f ) : POLYZ2 (Za , Zb ) but this time we fix
the identity learning rate (Example 11), instead of a constant one.
Example 21 (Basic learning in Boolean circuits). Fix XOR as the loss map (Ex-
ample 7) and the basic gradient update (Example 13). The put map again
simplifies. The type reduces to A × P × B → P and the implementation to
put(a, p, bt ) = p + p′ where (p′ , a′ ) = R[f ](p, a, f (p, a) + bt ).
here modelled without state S). This map takes an input-output pair (a0 , b0 ),
the current parameter pi and produces an updated parameter pi+1 . At the next
time step, it takes a potentially different input-output pair (a1 , b1 ), the updated
parameter pi+1 and produces pi+2 . This process is then repeated. We can model
this iteration as a composition of the put map with itself, as a composite (A ×
put × B); put whose type is A × A × P × B × B → P . This map takes two input-
output pairs A × B, a parameter and produces a new parameter by processing
these datapoints in sequence. One can see how this process can be iterated any
number of times, and even represented as a string diagram.
But we note that with a slight reformulation of the put map, it is possible
to obtain a conceptually much simpler definition. The key insight lies in seeing
that the map put : A × P × B → P is essentially an endo-map P → P with some
extra inputs A × B; it’s a parametric map!
In other words, we can recast the put map as a parametric map (A × B, put) :
Para(C)(P, P ). Being an endo-map, it can be composed with itself. The resulting
composite is an endo-map taking two “parameters”: input-output pair at the
time step 0 and time step 1. This process can then be repeated, with Para
composition automatically taking care of the algebra of iteration.
P
P put put . n. . put P
This reformulation captures the essence of parameter iteration: one can think
of it as a trajectory pi , pi+1 , pi+2 , ... through the parameter space; but it is a
trajectory parameterised by the dataset. With different datasets the algorithm
will take a different path through this space and learn different things.
We have seen that reparameterising the parameter port with gradient descent
allows us to capture supervised parameter learning. In this section we describe
how reparameterising the input port provides us with a way to enhance an input
image to elicit a particular interpretation. This is the idea behind the technique
called Deep Dreaming, appearing in the literature in many forms [19, 34, 35, 44].
S×A S×A P B
Optimiser
A A′
B′
A B L
5 Implementation
We model a lens (f, f ∗ ) in our library with the Lens class, which consists of a
pair of maps fwd and rev corresponding to f and f ∗ , respectively. For example,
we write the identity lens (1A , π2 ) as follows:
i d e n t i t y = Lens ( lambda x : x , lambda x dy : x dy [ 1 ] )
Let us now see how to construct a single layer neural network from the com-
position of such primitives. Diagramatically, we wish to construct the following
model, representing a single ‘dense’ layer of a neural network:
Rb×a Rb×a Rb Rb
Rb Rb
Ra Rb
linear bias activation (9)
Ra Rb
Rb Rb
Here, the parameters of linear are the coefficients of a b × a matrix, and the
underlying lens has as its forward map the function (M, x) → M · x, where M is
the b × a matrix whose coefficients are the Rb×a parameters, and x ∈ Ra is the
input vector. The bias map is even simpler: the forward map of the underlying
lens is simply pointwise addition of inputs and parameters: (b, x) → b+x. Finally,
the activation map simply applies a nonlinear function (e.g., sigmoid) to the
input, and thus has the trivial (unit) parameter space. The representation of
this composition in code is straightforward: we can simply compose the three
primitive Para maps as in (9):
def d e n s e ( a , b , a c t i v a t i o n ) :
return l i n e a r ( a , b ) >> b i a s ( b ) >> a c t i v a t i o n
5.2 Learning
Now that we have constructed a model, we also need to use it to learn from
data. Concretely, we will construct a full parametric lens as in Figure 2 then
extract its put map to iterate over the dataset.
By way of example, let us see how to construct the following parametric lens,
representing basic gradient descent over a single layer neural network with a
fixed learning rate:
A P P B
P P′
B′
A B L
dense loss (10)
ϵ
A′ B′ L′
Categorical Foundations of Gradient-Based Learning 23
Now, given the parametric lens of (10), one can construct a morphism step :
B ×P ×A → P which is simply the put map of the lens. Training the model then
consists of iterating the step function over dataset examples (x, y) ∈ A×B to op-
timise some initial choice of parameters θ0 ∈ P , by letting θi+1 = step(yi , θi , xi ).
Note that our library also provides a utility function to construct step from
its various pieces:
s t e p = s u p e r v i s e d s t e p ( model , update , l o s s , l e a r n i n g r a t e )
6 Related Work
The work [23] is closely related to ours, in that it provides an abstract categorical
model of backpropagation. However, it differs in a number of key aspects. We
give a complete lens-theoretic explanation of what is back-propagated via (i)
the use of CRDCs to model gradients; and (ii) the Para construction to model
parametric functions and parameter update. We thus can go well beyond [23]
in terms of examples - their example of smooth functions and basic gradient
descent is covered in our subsection 4.1.
We also explain some of the constructions of [23] in a more structured way.
For example, rather than considering the category Learn of [23] as primitive,
here we construct it as a composite of two more basic constructions (the Para
and Lens constructions). The flexibility could be used, for example, to com-
positionally replace Para with a variant allowing parameters to come from a
different category, or lenses with the category of optics [38] enabling us to model
things such as control flow using prisms.
One more relevant aspect is functoriality. We use a functor to augment a
parametric map with its backward pass, just like [23]. However, they additionally
augmented this map with a loss map and gradient descent using a functor as
well. This added extra conditions on the partial derivatives of the loss function:
it needed to be invertible in the 2nd variable. This constraint was not justified
in [23], nor is it a constraint that appears in machine learning practice. This led
us to reexamine their constructions, coming up with our reformulation that does
not require it. While loss maps and optimisers are mentioned in [23] as parts of
the aforementioned functor, here they are extracted out and play a key role: loss
maps are parametric lenses and optimisers are reparameterisations. Thus, in this
paper we instead use Para-composition to add the loss map to the model, and
Para 2-cells to add optimisers. The mentioned inverse of the partial derivative
of the loss map in the 2nd variable was also hypothesised to be relevant to deep
dreaming. We have investigated this possibility thoroughly in our paper, showing
24 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi
References
20. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning
and stochastic optimization. Journal of Machine Learning Research 12(Jul), 2121–
2159 (2011)
21. Elliott, C.: The simple essence of automatic differentiation (differentiable functional
programming made easy). arXiv:1804.00746 (2018)
22. Fong, B., Johnson, M.: Lenses and learners. In: Proceedings of the 8th International
Workshop on Bidirectional transformations (Bx@PLW) (2019)
23. Fong, B., Spivak, D.I., Tuyéras, R.: Backprop as functor: A compositional per-
spective on supervised learning. In: Proceedings of the Thirty fourth Annual IEEE
Symposium on Logic in Computer Science (LICS 2019). pp. 1–13. IEEE Computer
Society Press (June 2019)
24. Gavranovic, B.: Compositional deep learning. arXiv:1907.08292 (2019)
25. Ghani, N., Hedges, J., Winschel, V., Zahn, P.: Compositional game theory. In:
Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer
Science. p. 472–481. LICS ’18 (2018). https://doi.org/10.1145/3209108.3209165
26. Ghica, D.R., Jung, A., Lopez, A.: Diagrammatic Semantics for Digital Circuits.
arXiv:1703.10247 (2017)
27. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair,
S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Ghahramani, Z.,
Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in
Neural Information Processing Systems 27, pp. 2672–2680 (2014), http://papers.
nips.cc/paper/5423-generative-adversarial-nets.pdf
28. Griewank, A., Walther, A.: Evaluating derivatives: principles and techniques of
algorithmic differentiation. Society for Industrial and Applied Mathematics (2008)
29. Hedges, J.: Limits of bimorphic lenses. arXiv:1808.05545 (2018)
30. Hermida, C., Tennent, R.D.: Monoidal indeterminates and cate-
gories of possible worlds. Theor. Comput. Sci. 430, 3–22 (Apr 2012).
https://doi.org/10.1016/j.tcs.2012.01.001
31. Johnson, M., Rosebrugh, R., Wood, R.: Lenses, fibrations and universal transla-
tions. Mathematical structures in computer science 22, 25–42 (2012)
32. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio,
Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations,
ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
(2015), http://arxiv.org/abs/1412.6980
33. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied
to document recognition. In: Proceedings of the IEEE. pp. 2278–2324 (1998).
https://doi.org/10.1109/5.726791
34. Mahendran, A., Vedaldi, A.: Understanding deep image representations by invert-
ing them. arXiv:1412.0035 (2014)
35. Nguyen, A.M., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High
confidence predictions for unrecognizable images. arXiv:1412.1897 (2014)
36. Olah, C.: Neural networks, types, and functional programming (2015), http://
colah.github.io/posts/2015-09-NN-Types-FP/
37. Polyak, B.: Some methods of speeding up the convergence of iteration meth-
ods. USSR Computational Mathematics and Mathematical Physics 4(5), 1 –
17 (1964). https://doi.org/https://doi.org/10.1016/0041-5553(64)90137-5, http:
//www.sciencedirect.com/science/article/pii/0041555364901375
38. Riley, M.: Categories of optics. arXiv:1809.00738 (2018)
39. Selinger, P.: A survey of graphical languages for monoidal categories. Lecture Notes
in Physics p. 289–355 (2010)
28 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi
40. Selinger, P.: Control categories and duality: on the categorical semantics of the
lambda-mu calculus. Mathematical Structures in Computer Science 11(02), 207–
260 (4 2001). https://doi.org/null, http://journals.cambridge.org/article_
S096012950000311X
41. Seshia, S.A., Sadigh, D.: Towards verified artificial intelligence. CoRR
abs/1606.08514 (2016), http://arxiv.org/abs/1606.08514
42. Shiebler, D.: Categorical Stochastic Processes and Likelihood. Compositionality
3(1) (2021)
43. Shiebler, D., Gavranović, B., Wilson, P.: Category Theory in Machine Learning.
arXiv:2106.07032 (2021)
44. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks:
Visualising image classification models and saliency maps. arXiv:1312.6034 (2014)
45. Spivak, D.I.: Functorial data migration. arXiv:1009.1166 (2010)
46. Sprunger, D., Katsumata, S.y.: Differentiable causal computations via delayed
trace. In: Proceedings of the 34th Annual ACM/IEEE Symposium on Logic in
Computer Science. LICS ’19, IEEE Press (2019)
47. Steckermeier, A.: Lenses in functional programming. Preprint, available at
https://sinusoid.es/misc/lager/lenses.pdf (2015)
48. Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initial-
ization and momentum in deep learning. In: Dasgupta, S., McAllester, D. (eds.)
Proceedings of the 30th International Conference on Machine Learning. vol. 28,
pp. 1139–1147 (2013), http://proceedings.mlr.press/v28/sutskever13.html
49. Turi, D., Plotkin, G.: Towards a mathematical operational semantics. In: Pro-
ceedings of Twelfth Annual IEEE Symposium on Logic in Computer Science. pp.
280–291 (1997). https://doi.org/10.1109/LICS.1997.614955
50. Wilson, P., Zanasi, F.: Reverse derivative ascent: A categorical approach to learn-
ing boolean circuits. In: Proceedings of Applied Category Theory (ACT) (2020),
https://cgi.cse.unsw.edu.au/~eptcs/paper.cgi?ACT2020:31
51. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired Image-to-Image Translation
using Cycle-Consistent Adversarial Networks. arXiv:1703.10593 (2017)
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution 4.0 International License (http://creativecommons.org/licenses/by/
4.0/), which permits use, sharing, adaptation, distribution and reproduction in any
medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons license and indicate if changes
were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will need
to obtain permission directly from the copyright holder.
Compiling Universal Probabilistic Programming
Languages with Efficient Parallel Sequential
Monte Carlo Inference?
1
EECS and Digital Futures, KTH Royal Institute of Technology, Stockholm,
Sweden, {dlunde,dbro}@kth.se
2
AI Sweden, Stockholm, Sweden, joey.ohman@ai.se
3
Department of Data Science and Analytics, BI Norwegian Business School, Oslo,
Norway, jan.kudlicka@bi.no
4
Department of Bioinformatics and Genetics, Swedish Museum of Natural History,
Stockholm, Sweden, {viktor.senderov,fredrik.ronquist}@nrm.se
5
Department of Zoology, Stockholm University
1 Introduction
RootPPL SMC
Inference Engine
von
Albert Roderich.
Pelusa, die Tochter des Königs der Mohren,
war schwarz im Gesicht bis hinter die Ohren;
sie war wie geschnitzelt aus Ebenholz
und übermütig und scheußlich stolz.
Sie spielte aber vortrefflich Schach
und übte darin sich jeden Tag.
Einst machte bekannt sie durch ihre Bonzen
und auch zugleich durch Zeitungsannoncen,
es könnt’ mit ihr spielen um hohen Gewinns
eine Partie Schach jeder Vollblutprinz;
gewinnt er, so wird sie sein Ehegesponst,
verliert er, so muß er ihr dienen umsonst,
muß scheuern und putzen des Schlosses Treppen,
muß Holz zerspalten und Wasser schleppen. —
Es waren gekommen, auf Klugheit trutzend,
von Mohrenprinzen diverse Dutzend,
so viele, daß ich sie einzeln nicht zähl’,
zu Wasser, zu Pferde und auch zu Kamel,
Pelusa besiegte sie alle im Schach,
und Hausknechte wurden die Prinzen sonach.
Da kam mal ein weißer, ein Prinz vom Norden
— der Name ist nicht bekannt geworden —
der zeigte seinen Geburtsschein und sprach:
„Bitte, melden Sie mich der Prinzessin zum Schach!“
Wie die beiden einander gegenübersaßen,
da gefiel er dem Fräulein über die Maßen;
anstatt, daß wie sonst vorsichtig sie spielt,
hat heimlich sie nach dem Prinzen geschielt.
Ihre Kunst, die bewährte, ward immer geringer,
jetzt nimmt ihr der Prinz schon den zweiten Springer.
Die Schranzen können sich wundern nicht satt;
jetzt ruft schon der eine: „Beim nächsten Zug matt!“
Da beugte der Prinz vor Pelusa das Knie
und sagte: „Mein Fräulein, ich geb’ es remis!“
grüßt hübsch in der Runde verschiedene Mal
und verläßt mit zierlichem Lächeln den Saal.
Da glich das Antlitz der stolzen Pelusa
dem Angesicht einer schwarzen Medusa,
und regungslos saß sie voll Wut und Stolz,
als wär’ sie geschnitzelt aus Ebenholz.
Sie wartet noch heut’ auf den Prinzen vom Norden —
laß sie warten, bis sie weiß geworden.
Der Junge
von
Ferdinand Avenarius.
Wer war weggegangen, wer,
sag’ mir, Frau, kam wieder her?
Mit roten Backen, heisassa,
unsere Jugend ist wieder da!
Sieht wie ein großer Junge aus,
lärmt und tollt, es ist ein Graus.
Sitz’ ich bei der Arbeit sacht,
hängt er mir plötzlich am Hals und lacht,
macht mir das, wie sich’s gehört, Verdruß,
mir nichts, dir nichts, gibt’s einen Kuß.
Wehr’ ich mich endlich: „Nun aber hinaus!“
schaut er auf einmal ganz anders aus,
sieht mich aus den Augen verschmitzt
an, daß mir’s zum Herzen blitzt,
klatscht dann plötzlich in die Hand —
Himmel: von Pult und Schrank und Wand
von Mucken, Motten und Hummeln brummts
und hinaus zum Fenster summts!
„Ich bin die Jugend,“ lacht er dazu:
„Das kann ich — nun duld mich, du!“
Gut, so mag’s fortan denn sein:
Wir Alten, die Jugend, wir bleiben zu drei’n!
Ein Bildchen
von
Carl Spitteler.
Den Rain hinauf, mit trotzigem Alarm
fuchtelt ein Kinderschwarm.
„Vorwärts! Hurra!“
Hut ab! Du schaust kein Spiel.
Den Himmel zu erstürmen gilt das ernste Ziel.
Er ist so nah!
Siehst, wie er aus dem Grase guckt dort oben?
von
Carl Spitteler.
Am Kreuzweg seufzt’ ein Brückengeist,
umringt von sieben Kleinen,
mit Wanderpack und Bettelsack,
und alle Kleinen weinen.
„Was fehlt dir, Vater? fasse Mut,
erzähle mir die Märe,
was dir geschah, und ob ich dir
vielleicht behilflich wäre.“
Der Alte ächzt’ und wischte sich
die tränenfeuchten Lider,
hernach mit kummervollem Blick
gab er die Antwort wieder:
„Ich lebt’ als ehrliches Gespenst
im trauten Uferloche
friedlich am heimatlichen Fluß
unter dem Brückenjoche.
Ach! war das eine schöne Zeit!
Die Brücke war in Stücken,
zwei Balken fehlten, einer wich,
die andern hatten Lücken,
der Mittelpfosten schaukelte
und tanzte vor Vergnügen;
kurz, selbst der strengsten Forderung
konnte der Bau genügen.
Und da einmal Gespensterpflicht
erfordert, wen zu necken,
so wählten wir die Profession,
die Pferde zu erschrecken.
’s ist eine angestammte Kunst
vom Urgroßvater ferne,
und wenn wir drinnen Meister sind,
das macht: wir tun’s halt gerne.
Zwar so ein Gaul am Wägelein
und solche kleine Dinge —
bewahr’! dergleichen lockt uns nicht,
das war uns zu geringe;
dagegen eine Jagdpartie,
ein Picknick meinetwegen
auf heißen Rasserossen! Hah!
da lohnte sich’s hingegen!
Man ließ das Trüpplein ungestört
tripp trapp im muntern Schritte,
mit Scherz und Sang tralli tralla
bis auf die Brückenmitte.
Dann, auf mein Zeichen, ging es los:
verborgen im Gebälke,
eröffneten zugleich den Krieg
die sieben süßen Schälke.
Der Leopold, der Barnabas,
der Klaus, der Sakranitsche
klatschten den Pferden um die Knie
mit Latten und mit Pritsche.
Der Wenzel zerrte sie am Schweif,
der Philipp, nach den Regeln,
wippt’ ihnen Balken an den Bauch,
die kitzelten mit Nägeln.
‚Ich komme auch!‘ rief Fridolin,
‚wart’ doch! nicht solche Eile!‘
nahm hurtig einen Span und stieß
und stach die Hinterteile.
War das ein Wirrwarr und Geschrei!
Das hätt’st du sehen sollen!
Vor Angst und Aufruhr wußte keins,
ob vor- ob rückwärtswollen.
Und war nun alles unterobs,
dann fuhr ich wie der Teufel
haushoch hervor mit „Holdridu“.
Da schwand der letzte Zweifel.
Links, rechts hinunter in den Fluß,
plumps über das Geländer.
Und lustig schwammen Sonnenschirm’
und Strohhüt’ und Gewänder.
von
Und als ich zum Abschied die Hand gab der Kleinen,
Halli.
Da fing sie bitterlich an zu weinen,
Hallo.
Was denk ich just heut’ ohn’ Unterlaß,
daß ich ihr so rauh gab den Reisepaß ...
Wein her, zum Henker, und da liegt Trumpf Aß.
Halli und Hallo.
Die Musik kommt
von
von
von
von
von
Schilderhaus,
Wache ’raus.
Schloßportal,
und im Saal
steht meine süße Lady.
Hofmarschall,
Pagenwall.
Sehr graziös,
merveillös
knixt meine süße Lady.
Königin,
hoher Sinn,
ihre Hand,
interessant,
küßt meine süße Lady.
von
von
von