100% found this document useful (1 vote)
17 views

Programming Languages and Systems Ilya Sergey - Read the ebook online or download it to own the complete version

The document provides information about the 31st European Symposium on Programming (ESOP 2022) held in Munich, Germany, detailing its organization, acceptance rates, and the quality of submissions. It highlights the conference's focus on programming languages and systems, with 21 papers accepted from 64 submissions. Additionally, it mentions the introduction of optional artifact evaluation for the first time, enhancing the conference's rigor and quality.

Uploaded by

eltommaaike
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
17 views

Programming Languages and Systems Ilya Sergey - Read the ebook online or download it to own the complete version

The document provides information about the 31st European Symposium on Programming (ESOP 2022) held in Munich, Germany, detailing its organization, acceptance rates, and the quality of submissions. It highlights the conference's focus on programming languages and systems, with 21 papers accepted from 64 submissions. Additionally, it mentions the introduction of optional artifact evaluation for the first time, enhancing the conference's rigor and quality.

Uploaded by

eltommaaike
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Read Anytime Anywhere Easy Ebook Downloads at ebookmeta.

com

Programming Languages and Systems Ilya Sergey

https://ebookmeta.com/product/programming-languages-and-
systems-ilya-sergey/

OR CLICK HERE

DOWLOAD EBOOK

Visit and Get More Ebook Downloads Instantly at https://ebookmeta.com


Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

Programming Languages Application and Interpretation


printing Shriram Krishnamurthi

https://ebookmeta.com/product/programming-languages-application-and-
interpretation-printing-shriram-krishnamurthi/

ebookmeta.com

Programming Languages Build Prove and Compare Norman


Ramsey

https://ebookmeta.com/product/programming-languages-build-prove-and-
compare-norman-ramsey/

ebookmeta.com

Programming Languages: Principles and Paradigms (2nd


Edition) Maurizio Gabbrielli

https://ebookmeta.com/product/programming-languages-principles-and-
paradigms-2nd-edition-maurizio-gabbrielli/

ebookmeta.com

CYBER CRIME INVESTIGATOR'S FIELD GUIDE 3rd Edition Bruce


Middleton

https://ebookmeta.com/product/cyber-crime-investigators-field-
guide-3rd-edition-bruce-middleton/

ebookmeta.com
Engaging with Brecht. Making Theatre in the Twenty-first
Century 1st Edition Bill Gelber

https://ebookmeta.com/product/engaging-with-brecht-making-theatre-in-
the-twenty-first-century-1st-edition-bill-gelber/

ebookmeta.com

Mothership Haunting of Ypsilon 4 1st Edition Sean Mccay

https://ebookmeta.com/product/mothership-haunting-of-ypsilon-4-1st-
edition-sean-mccay/

ebookmeta.com

Ai-Powered Business Intelligence 1st Edition Tobias


Zwingmann

https://ebookmeta.com/product/ai-powered-business-intelligence-1st-
edition-tobias-zwingmann/

ebookmeta.com

Handmade Soap Book Easy Soapmaking with Natural


Ingredients 2nd Edition Melinda Coss

https://ebookmeta.com/product/handmade-soap-book-easy-soapmaking-with-
natural-ingredients-2nd-edition-melinda-coss/

ebookmeta.com

The Black Elfstone Book One of the Fall of Shannara 1st


Edition Terry Brooks

https://ebookmeta.com/product/the-black-elfstone-book-one-of-the-fall-
of-shannara-1st-edition-terry-brooks/

ebookmeta.com
Making Faithful Decisions at the End of Life 3rd Edition
Nancy J Duff

https://ebookmeta.com/product/making-faithful-decisions-at-the-end-of-
life-3rd-edition-nancy-j-duff/

ebookmeta.com
ARCoSS Ilya Sergey (Ed.)

Programming
LNCS 13240

Languages
and Systems
31st European Symposium on Programming, ESOP 2022
Held as Part of the European Joint Conferences
on Theory and Practice of Software, ETAPS 2022
Munich, Germany, April 2–7, 2022
Proceedings
Lecture Notes in Computer Science 13240
Founding Editors
Gerhard Goos, Germany
Juris Hartmanis, USA

Editorial Board Members


Elisa Bertino, USA Gerhard Woeginger , Germany
Wen Gao, China Moti Yung , USA
Bernhard Steffen , Germany

Advanced Research in Computing and Software Science


Subline of Lecture Notes in Computer Science

Subline Series Editors


Giorgio Ausiello, University of Rome ‘La Sapienza’, Italy
Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board


Susanne Albers, TU Munich, Germany
Benjamin C. Pierce, University of Pennsylvania, USA
Bernhard Steffen , University of Dortmund, Germany
Deng Xiaotie, Peking University, Beijing, China
Jeannette M. Wing, Microsoft Research, Redmond, WA, USA
More information about this series at https://link.springer.com/bookseries/558
Ilya Sergey (Ed.)

Programming
Languages
and Systems
31st European Symposium on Programming, ESOP 2022
Held as Part of the European Joint Conferences
on Theory and Practice of Software, ETAPS 2022
Munich, Germany, April 2–7, 2022
Proceedings

123
Editor
Ilya Sergey
National University of Singapore
Singapore, Singapore

ISSN 0302-9743 ISSN 1611-3349 (electronic)


Lecture Notes in Computer Science
ISBN 978-3-030-99335-1 ISBN 978-3-030-99336-8 (eBook)
https://doi.org/10.1007/978-3-030-99336-8
© The Editor(s) (if applicable) and The Author(s) 2022. This book is an open access publication.
Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International
License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution
and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and
the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this book are included in the book’s Creative Commons license,
unless indicated otherwise in a credit line to the material. If material is not included in the book’s Creative
Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use,
you will need to obtain permission directly from the copyright holder.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
ETAPS Foreword

Welcome to the 25th ETAPS! ETAPS 2022 took place in Munich, the beautiful capital
of Bavaria, in Germany.
ETAPS 2022 is the 25th instance of the European Joint Conferences on Theory and
Practice of Software. ETAPS is an annual federated conference established in 1998,
and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each
conference has its own Program Committee (PC) and its own Steering Committee
(SC). The conferences cover various aspects of software systems, ranging from theo-
retical computer science to foundations of programming languages, analysis tools, and
formal approaches to software engineering. Organizing these conferences in a coherent,
highly synchronized conference program enables researchers to participate in an
exciting event, having the possibility to meet many colleagues working in different
directions in the field, and to easily attend talks of different conferences. On the
weekend before the main conference, numerous satellite workshops took place that
attract many researchers from all over the globe.
ETAPS 2022 received 362 submissions in total, 111 of which were accepted,
yielding an overall acceptance rate of 30.7%. I thank all the authors for their interest in
ETAPS, all the reviewers for their reviewing efforts, the PC members for their con-
tributions, and in particular the PC (co-)chairs for their hard work in running this entire
intensive process. Last but not least, my congratulations to all authors of the accepted
papers!
ETAPS 2022 featured the unifying invited speakers Alexandra Silva (University
College London, UK, and Cornell University, USA) and Tomáš Vojnar (Brno
University of Technology, Czech Republic) and the conference-specific invited
speakers Nathalie Bertrand (Inria Rennes, France) for FoSSaCS and Lenore Zuck
(University of Illinois at Chicago, USA) for TACAS. Invited tutorials were provided by
Stacey Jeffery (CWI and QuSoft, The Netherlands) on quantum computing and
Nicholas Lane (University of Cambridge and Samsung AI Lab, UK) on federated
learning.
As this event was the 25th edition of ETAPS, part of the program was a special
celebration where we looked back on the achievements of ETAPS and its constituting
conferences in the past, but we also looked into the future, and discussed the challenges
ahead for research in software science. This edition also reinstated the ETAPS men-
toring workshop for PhD students.
ETAPS 2022 took place in Munich, Germany, and was organized jointly by the
Technical University of Munich (TUM) and the LMU Munich. The former was
founded in 1868, and the latter in 1472 as the 6th oldest German university still running
today. Together, they have 100,000 enrolled students, regularly rank among the top
100 universities worldwide (with TUM’s computer-science department ranked #1 in
the European Union), and their researchers and alumni include 60 Nobel laureates.
vi ETAPS Foreword

The local organization team consisted of Jan Křetínský (general chair), Dirk Beyer
(general, financial, and workshop chair), Julia Eisentraut (organization chair), and
Alexandros Evangelidis (local proceedings chair).
ETAPS 2022 was further supported by the following associations and societies:
ETAPS e.V., EATCS (European Association for Theoretical Computer Science),
EAPLS (European Association for Programming Languages and Systems), and EASST
(European Association of Software Science and Technology).
The ETAPS Steering Committee consists of an Executive Board, and representa-
tives of the individual ETAPS conferences, as well as representatives of EATCS,
EAPLS, and EASST. The Executive Board consists of Holger Hermanns
(Saarbrücken), Marieke Huisman (Twente, chair), Jan Kofroň (Prague), Barbara König
(Duisburg), Thomas Noll (Aachen), Caterina Urban (Paris), Tarmo Uustalu (Reykjavik
and Tallinn), and Lenore Zuck (Chicago).
Other members of the Steering Committee are Patricia Bouyer (Paris), Einar Broch
Johnsen (Oslo), Dana Fisman (Be’er Sheva), Reiko Heckel (Leicester), Joost-Pieter
Katoen (Aachen and Twente), Fabrice Kordon (Paris), Jan Křetínský (Munich), Orna
Kupferman (Jerusalem), Leen Lambers (Cottbus), Tiziana Margaria (Limerick),
Andrew M. Pitts (Cambridge), Elizabeth Polgreen (Edinburgh), Grigore Roşu (Illinois),
Peter Ryan (Luxembourg), Sriram Sankaranarayanan (Boulder), Don Sannella
(Edinburgh), Lutz Schröder (Erlangen), Ilya Sergey (Singapore), Natasha Sharygina
(Lugano), Pawel Sobocinski (Tallinn), Peter Thiemann (Freiburg), Sebastián Uchitel
(London and Buenos Aires), Jan Vitek (Prague), Andrzej Wasowski (Copenhagen),
Thomas Wies (New York), Anton Wijs (Eindhoven), and Manuel Wimmer (Linz).
I’d like to take this opportunity to thank all authors, attendees, organizers of the
satellite workshops, and Springer-Verlag GmbH for their support. I hope you all
enjoyed ETAPS 2022.
Finally, a big thanks to Jan, Julia, Dirk, and their local organization team for all their
enormous efforts to make ETAPS a fantastic event.

February 2022 Marieke Huisman


ETAPS SC Chair
ETAPS e.V. President
Preface

This volume contains the papers accepted at the 31st European Symposium on
Programming (ESOP 2022), held during April 5–7, 2022, in Munich, Germany
(COVID-19 permitting). ESOP is one of the European Joint Conferences on Theory
and Practice of Software (ETAPS); it is dedicated to fundamental issues in the spec-
ification, design, analysis, and implementation of programming languages and systems.
The 21 papers in this volume were selected by the Program Committee (PC) from
64 submissions. Each submission received between three and four reviews. After
receiving the initial reviews, the authors had a chance to respond to questions and
clarify misunderstandings of the reviewers. After the author response period, the papers
were discussed electronically using the HotCRP system by the 33 Program Committee
members and 33 external reviewers. Two papers, for which the PC chair had a conflict
of interest, were kindly managed by Zena Ariola. The reviewing for ESOP 2022 was
double-anonymous, and only authors of the eventually accepted papers have been
revealed.
Following the example set by other major conferences in programming languages,
for the first time in its history, ESOP featured optional artifact evaluation. Authors
of the accepted manuscripts were invited to submit artifacts, such as code, datasets, and
mechanized proofs, that supported the conclusions of their papers. Members of the
Artifact Evaluation Committee (AEC) read the papers and explored the artifacts,
assessing their quality and checking that they supported the authors’ claims. The
authors of eleven of the accepted papers submitted artifacts, which were evaluated by
20 AEC members, with each artifact receiving four reviews. Authors of papers with
accepted artifacts were assigned official EAPLS artifact evaluation badges, indicating
that they have taken the extra time and have undergone the extra scrutiny to prepare a
useful artifact. The ESOP 2022 AEC awarded Artifacts Functional and Artifacts
(Functional and) Reusable badges. All submitted artifacts were deemed Functional, and
all but one were found to be Reusable.
My sincere thanks go to all who contributed to the success of the conference and to
its exciting program. This includes the authors who submitted papers for consideration;
the external reviewers who provided timely expert reviews sometimes on very short
notice; the AEC members and chairs who took great care of this new aspect of ESOP;
and, of course, the members of the ESOP 2022 Program Committee. I was extremely
impressed by the excellent quality of the reviews, the amount of constructive feedback
given to the authors, and the criticism delivered in a professional and friendly tone.
I am very grateful to Andreea Costea and KC Sivaramakrishnan who kindly agreed to
serve as co-chairs for the ESOP 2022 Artifact Evaluation Committee. I would like to
thank the ESOP 2021 chair Nobuko Yoshida for her advice, patience, and the many
insightful discussions on the process of running the conference. I thank all who con-
tributed to the organization of ESOP: the ESOP steering committee and its chair Peter
Thiemann, as well as the ETAPS steering committee and its chair Marieke Huisman.
viii Preface

Finally, I would like to thank Barbara König and Alexandros Evangelidis for their help
with assembling the proceedings.

February 2022 Ilya Sergey


Organization

Program Chair
Ilya Sergey National University of Singapore, Singapore

Program Committee
Michael D. Adams Yale-NUS College, Singapore
Danel Ahman University of Ljubljana, Slovenia
Aws Albarghouthi University of Wisconsin-Madison, USA
Zena M. Ariola University of Oregon, USA
Ahmed Bouajjani Université de Paris, France
Giuseppe Castagna CNRS, Université de Paris, France
Cristina David University of Bristol, UK
Mariangiola Dezani Università di Torino, Italy
Rayna Dimitrova CISPA Helmholtz Center for Information Security,
Germany
Jana Dunfield Queen’s University, Canada
Aquinas Hobor University College London, UK
Guilhem Jaber Université de Nantes, France
Jeehoon Kang KAIST, South Korea
Ekaterina Komendantskaya Heriot-Watt University, UK
Ori Lahav Tel Aviv University, Israel
Ivan Lanese Università di Bologna, Italy, and Inria, France
Dan Licata Wesleyan University, USA
Sam Lindley University of Edinburgh, UK
Andreas Lochbihler Digital Asset, Switzerland
Cristina Lopes University of California, Irvine, USA
P. Madhusudan University of Illinois at Urbana-Champaign, USA
Stefan Marr University of Kent, UK
James Noble Victoria University of Wellington, New Zealand
Burcu Kulahcioglu Ozkan Delft University of Technology, The Netherlands
Andreas Pavlogiannis Aarhus University, Denmark
Vincent Rahli University of Birmingham, UK
Robert Rand University of Chicago, USA
Christine Rizkallah University of Melbourne, Australia
Alejandro Russo Chalmers University of Technology, Sweden
Gagandeep Singh University of Illinois at Urbana-Champaign, USA
Gordon Stewart BedRock Systems, USA
Joseph Tassarotti Boston College, USA
Bernardo Toninho Universidade NOVA de Lisboa, Portugal
x Organization

Additional Reviewers
Andreas Abel Gothenburg University, Sweden
Guillaume Allais University of St Andrews, UK
Kalev Alpernas Tel Aviv University, Israel
Davide Ancona Università di Genova, Italy
Stephanie Balzer Carnegie Mellon University, USA
Giovanni Bernardi Université de Paris, France
Soham Chakraborty Delft University of Technology, The Netherlands
Arthur Chargueraud Inria, France
Ranald Clouston Australian National University, Australia
Fredrik Dahlqvist University College London, UK
Olivier Danvy Yale-NUS College, Singapore
Benjamin Delaware Purdue University, USA
Dominique Devriese KU Leuven, Belgium
Paul Downen University of Massachusetts, Lowell, USA
Yannick Forster Saarland University, Germany
Milad K. Ghale University of New South Wales, Australia
Kiran Gopinathan National University of Singapore, Singapore
Tristan Knoth University of California, San Diego, USA
Paul Levy University of Birmingham, UK
Umang Mathur National University of Singapore, Singapore
McKenna McCall Carnegie Mellon University, USA
Garrett Morris University of Iowa, USA
Fredrik Nordvall Forsberg University of Strathclyde, UK
José N. Oliveira University of Minho, Portugal
Alex Potanin Australian National University, Australia
Susmit Sarkar University of St Andrews, UK
Filip Sieczkowski Heriot-Watt University, UK
Kartik Singhal University of Chicago, USA
Sandro Stucki Chalmers University of Technology and University
of Gothenburg, Sweden
Amin Timany Aarhus University, Denmark
Klaus v. Gleissenthall Vrije Universiteit Amsterdam, The Netherlands
Thomas Wies New York University, USA
Vladimir Zamdzhiev Inria, Loria, Université de Lorraine, France

Artifact Evaluation Committee Chairs


Andreea Costea National University of Singapore, Singapore
K. C. Sivaramakrishnan IIT Madras, India

Artifact Evaluation Committee


Utpal Bora IIT Hyderabad, India
Darion Cassel Carnegie Mellon University, USA
Organization xi

Pritam Choudhury University of Pennsylvania, USA


Jan de Muijnck-Hughes University of Glasgow, UK
Darius Foo National University of Singapore, Singapore
Léo Gourdin Université Grenoble-Alpes, France
Daniel Hillerström University of Edinburgh, UK
Jules Jacobs Radboud University, The Netherlands
Chaitanya Koparkar Indiana University, USA
Yinling Liu Toulouse Computer Science Research Center, France
Yiyun Liu University of Pennsylvania, USA
Kristóf Marussy Budapest University of Technology and Economics,
Hungary
Orestis Melkonian University of Edinburgh, UK
Shouvick Mondal Concordia University, Canada
Krishna Narasimhan TU Darmstadt, Germany
Mário Pereira Universidade NOVA de Lisboa, Portugal
Goran Piskachev Fraunhofer IEM, Germany
Somesh Singh Inria, France
Yahui Song National University of Singapore, Singapore
Vimala Soundarapandian IIT Madras, India
Contents

Categorical Foundations of Gradient-Based Learning . . . . . . . . . . . . . . . . . . 1


Geoffrey S. H. Cruttwell, Bruno Gavranović, Neil Ghani, Paul Wilson,
and Fabio Zanasi

Compiling Universal Probabilistic Programming Languages with Efficient


Parallel Sequential Monte Carlo Inference . . . . . . . . . . . . . . . . . . . . . . . . . 29
Daniel Lundén, Joey Öhman, Jan Kudlicka, Viktor Senderov,
Fredrik Ronquist, and David Broman

Foundations for Entailment Checking in Quantitative Separation Logic . . . . . 57


Kevin Batz, Ira Fesefeldt, Marvin Jansen, Joost-Pieter Katoen,
Florian Keßler, Christoph Matheja, and Thomas Noll

Extracting total Amb programs from proofs . . . . . . . . . . . . . . . . . . . . . . . . 85


Ulrich Berger and Hideki Tsuiki

Why3-do: The Way of Harmonious Distributed System Proofs . . . . . . . . . . . 114


Cláudio Belo Lourenço and Jorge Sousa Pinto

Relaxed virtual memory in Armv8-A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143


Ben Simner, Alasdair Armstrong, Jean Pichon-Pharabod,
Christopher Pulte, Richard Grisenthwaite, and Peter Sewell

Verified Security for the Morello Capability-enhanced Prototype


Arm Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Thomas Bauereiss, Brian Campbell, Thomas Sewell,
Alasdair Armstrong, Lawrence Esswood, Ian Stark, Graeme Barnes,
Robert N. M. Watson, and Peter Sewell

The Trusted Computing Base of the CompCert Verified Compiler . . . . . . . . 204


David Monniaux and Sylvain Boulmé

View-Based Owicki–Gries Reasoning for Persistent x86-TSO. . . . . . . . . . . . 234


Eleni Vafeiadi Bila, Brijesh Dongol, Ori Lahav, Azalea Raad,
and John Wickerson

Abstraction for Crash-Resilient Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 262


Artem Khyzha and Ori Lahav

Static Race Detection for Periodic Programs . . . . . . . . . . . . . . . . . . . . . . . . 290


Varsha P Suresh, Rekha Pai, Deepak D’Souza, Meenakshi D’Souza,
and Sujit Kumar Chakrabarti
xiv Contents

Probabilistic Total Store Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317


Parosh Aziz Abdulla, Mohamed Faouzi Atig, Raj Aryan Agarwal,
Adwait Godbole, and Krishna S.

Linearity and Uniqueness: An Entente Cordiale . . . . . . . . . . . . . . . . . . . . . 346


Daniel Marshall, Michael Vollmer, and Dominic Orchard

A Framework for Substructural Type Systems . . . . . . . . . . . . . . . . . . . . . . 376


James Wood and Robert Atkey

A Dependent Dependency Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403


Pritam Choudhury, Harley Eades III, and Stephanie Weirich

Polarized Subtyping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431


Zeeshan Lakhani, Ankush Das, Henry DeYoung, Andreia Mordido,
and Frank Pfenning

Structured Handling of Scoped Effects. . . . . . . . . . . . . . . . . . . . . . . . . . . . 462


Zhixuan Yang, Marco Paviotti, Nicolas Wu, Birthe van den Berg,
and Tom Schrijvers

Region-based Resource Management and Lexical Exception Handlers


in Continuation-Passing Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
Philipp Schuster, Jonathan Immanuel Brachthäuser,
and Klaus Ostermann

A Predicate Transformer for Choreographies: Computing Preconditions


in Choreographic Programming. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
Sung-Shik Jongmans and Petra van den Bos

Comparing the Expressiveness of the p-calculus and CCS . . . . . . . . . . . . . . 548


Rob van Glabbeek

Concurrent NetKAT: Modeling and analyzing stateful,


concurrent networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
Jana Wagemaker, Nate Foster, Tobias Kappé, Dexter Kozen,
Jurriaan Rot, and Alexandra Silva

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603


Categorical Foundations of Gradient-Based Learning

Geoffrey S. H. Cruttwell1 ( ) , Bruno Gavranović2 ( ) , Neil Ghani2 ( ) ,


Paul Wilson4 ( ) , and Fabio Zanasi4 ( )
1
Mount Allison University, Canada
2
University of Strathclyde, United Kingdom
3
University College London

Abstract. We propose a categorical semantics of gradient-based ma-


chine learning algorithms in terms of lenses, parametric maps, and re-
verse derivative categories. This foundation provides a powerful explana-
tory and unifying framework: it encompasses a variety of gradient descent
algorithms such as ADAM, AdaGrad, and Nesterov momentum, as well
as a variety of loss functions such as MSE and Softmax cross-entropy,
shedding new light on their similarities and differences. Our approach to
gradient-based learning has examples generalising beyond the familiar
continuous domains (modelled in categories of smooth maps) and can
be realized in the discrete setting of boolean circuits. Finally, we demon-
strate the practical significance of our framework with an implementation
in Python.

1 Introduction
The last decade has witnessed a surge of interest in machine learning, fuelled by
the numerous successes and applications that these methodologies have found in
many fields of science and technology. As machine learning techniques become
increasingly pervasive, algorithms and models become more sophisticated, posing
a significant challenge both to the software developers and the users that need to
interface, execute and maintain these systems. In spite of this rapidly evolving
picture, the formal analysis of many learning algorithms mostly takes place at a
heuristic level [41], or using definitions that fail to provide a general and scalable
framework for describing machine learning. Indeed, it is commonly acknowledged
through academia, industry, policy makers and funding agencies that there is a
pressing need for a unifying perspective, which can make this growing body of
work more systematic, rigorous, transparent and accessible both for users and
developers [2, 36].
Consider, for example, one of the most common machine learning scenar-
ios: supervised learning with a neural network. This technique trains the model
towards a certain task, e.g. the recognition of patterns in a data set (cf. Fig-
ure 1). There are several different ways of implementing this scenario. Typically,
at their core, there is a gradient update algorithm (often called the “optimiser”),
depending on a given loss function, which updates in steps the parameters of the
network, based on some learning rate controlling the “scaling” of the update. All
c The Author(s) 2022
I. Sergey (Ed.): ESOP 2022, LNCS 13240, pp. 1–28, 2022.
https://doi.org/10.1007/978-3-030-99336-8_1
2 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

of these components can vary independently in a supervised learning algorithm


and a number of choices is available for loss maps (quadratic error, Softmax
cross entropy, dot product, etc.) and optimisers (Adagrad [20], Momentum [37],
and Adam [32], etc.).

Fig. 1: An informal illustration of gradient-based learning. This neural network


is trained to distinguish different kinds of animals in the input image. Given an
input X, the network predicts an output Y , which is compared by a ‘loss map’
with what would be the correct answer (‘label’). The loss map returns a real
value expressing the error of the prediction; this information, together with the
learning rate (a weight controlling how much the model should be changed in
response to error) is used by an optimiser, which computes by gradient-descent
the update of the parameters of the network, with the aim of improving its
accuracy. The neural network, the loss map, the optimiser and the learning rate
are all components of a supervised learning system, and can vary independently
of one another.

This scenario highlights several questions: is there a uniform mathemati-


cal language capturing the different components of the learning process? Can
we develop a unifying picture of the various optimisation techniques, allowing
for their comparative analysis? Moreover, it should be noted that supervised
learning is not limited to neural networks. For example, supervised learning is
surprisingly applicable to the discrete setting of boolean circuits [50] where con-
tinuous functions are replaced by boolean-valued functions. Can we identify an
abstract perspective encompassing both the real-valued and the boolean case?
In a nutshell, this paper seeks to answer the question:
what are the fundamental mathematical structures underpinning gradient-
based learning?
Our approach to this question stems from the identification of three funda-
mental aspects of the gradient-descent learning process:
(I) computation is parametric, e.g. in the simplest case we are given a function
f : P × X → Y and learning consists of finding a parameter p : P such
Categorical Foundations of Gradient-Based Learning 3

that f (p, −) is the best function according to some criteria. Specifically, the
weights on the internal nodes of a neural network are a parameter which the
learning is seeking to optimize. Parameters also arise elsewhere, e.g. in the
loss function (see later).
(II) information flows bidirectionally: in the forward direction, the computa-
tion turns inputs via a sequence of layers into predicted outputs, and then
into a loss value; in the reverse direction, backpropagation is used propa-
gate the changes backwards through the layers, and then turn them into
parameter updates.
(III) the basis of parameter update via gradient descent is differentiation e.g.
in the simple case we differentiate the function mapping a parameter to its
associated loss to reduce that loss.

We model bidirectionality via lenses [6, 12, 29] and based upon the above
three insights, we propose the notion of parametric lens as the fundamental
semantic structure of learning. In a nutshell, a parametric lens is a process with
three kinds of interfaces: inputs, outputs, and parameters. On each interface,
information flows both ways, i.e. computations are bidirectional. These data
are best explained with our graphical representation of parametric lenses, with
inputs A, A′ , outputs B, B ′ , parameters P , P ′ , and arrows indicating information
flow (below left). The graphical notation also makes evident that parametric
lenses are open systems, which may be composed along their interfaces (below
center and right).
Q Q′
′ ′ Q Q ′
P P P P

B
A B A C
(1)
A′ B′ A′ C′ P P′
B′
A B
A′ B′

This pictorial formalism is not just an intuitive sketch: as we will show, it can
be understood as a completely formal (graphical) syntax using the formalism of
string diagrams [39], in a way similar to how other computational phenomena
have been recently analysed e.g. in quantum theory [14], control theory [5, 8],
and digital circuit theory [26].
It is intuitively clear how parametric lenses express aspects (I) and (II) above,
whereas (III) will be achieved by studying them in a space of ‘differentiable
objects’ (in a sense that will be made precise). The main technical contribution
of our paper is showing how the various ingredients involved in learning (the
model, the optimiser, the error map and the learning rate) can be uniformly
understood as being built from parametric lenses.
We will use category theory as the formal language to develop our notion of
parametric lenses, and make Figure 2 mathematically precise. The categorical
perspective brings several advantages, which are well-known, established princi-
ples in programming language semantics [3,40,49]. Three of them are particularly
4 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

A P P
B

Optimiser

P P′
B′
A B L
Learning
Model Loss
rate
A′ B′ L′

Fig. 2: The parametric lens that captures the learning process informally sketched
in Figure 1. Note each component is a lens itself, whose composition yields the
interactions described in Figure 1. Defining this picture formally will be the
subject of Sections 3-4.

important to our contribution, as they constitute distinctive advantages of our


semantic foundations:
Abstraction Our approach studies which categorical structures are sufficient
to perform gradient-based learning. This analysis abstracts away from the
standard case of neural networks in several different ways: as we will see, it
encompasses other models (namely Boolean circuits), different kinds of op-
timisers (including Adagrad, Adam, Nesterov momentum), and error maps
(including quadratic and softmax cross entropy loss). These can be all un-
derstood as parametric lenses, and different forms of learning result from
their interaction.
Uniformity As seen in Figure 1, learning involves ingredients that are seem-
ingly quite different: a model, an optimiser, a loss map, etc. We will show
how all these notions may be seen as instances of the categorical defini-
tion of a parametric lens, thus yielding a remarkably uniform description of
the learning process, and supporting our claim of parametric lenses being a
fundamental semantic structure of learning.
Compositionality The use of categorical structures to describe computation
naturally enables compositional reasoning whereby complex systems are anal-
ysed in terms of smaller, and hence easier to understand, components. Com-
positionality is a fundamental tenet of programming language semantics; in
the last few years, it has found application in the study of diverse kinds of
computational models, across different fields— see e.g. [8,14,25,45]. As made
evident by Figure 2, our approach models a neural network as a parametric
lens, resulting from the composition of simpler parametric lenses, capturing
the different ingredients involved in the learning process. Moreover, as all
the simpler parametric lenses are themselves composable, one may engineer
a different learning process by simply plugging a new lens on the left or right
of existing ones. This means that one can glue together smaller and relatively
simple networks to create larger and more sophisticated neural networks.
Categorical Foundations of Gradient-Based Learning 5

We now give a synopsis of our contributions:


– In Section 2, we introduce the tools necessary to define our notion of para-
metric lens. First, in Section 2.1, we introduce a notion of parametric cat-
egories, which amounts to a functor Para(−) turning a category C into one
Para(C) of ‘parametric C-maps’. Second, we recall lenses (Section 2.2). In a
nutshell, a lens is a categorical morphism equipped with operations to view
and update values in a certain data structure. Lenses play a prominent role
in functional programming [47], as well as in the foundations of database
theory [31] and more recently game theory [25]. Considering lenses in C sim-
ply amounts to the application of a functorial construction Lens(−), yield-
ing Lens(C). Finally, we recall the notion of a cartesian reverse differential
category (CRDC): a categorical structure axiomatising the notion of differ-
entiation [13] (Section 2.4). We wrap up in Section 2.3, by combining these
ingredients into the notion of parametric lens, formally defined as a morphism
in Para(Lens(C)) for a CRDC C. In terms of our desiderata (I)-(III) above,
note that Para(−) accounts for (I), Lens(−) accounts for (II), and the CRDC
structure accounts for (III).
– As seen in Figure 1, in the learning process there are many components at
work: the model, the optimiser, the loss map, the learning rate, etc.. In Sec-
tion 3, we show how the notion of parametric lens provides a uniform char-
acterisation for such components. Moreover, for each of them, we show how
different variations appearing in the literature become instances of our ab-
stract characterisation. The plan is as follows:
◦ In Section 3.1, we show how the combinatorial model subject of the training
can be seen as a parametric lens. The conditions we provide are met by the
‘standard’ case of neural networks, but also enables the study of learning for
other classes of models. In particular, another instance are Boolean circuits:
learning of these structures is relevant to binarisation [16] and it has been
explored recently using a categorical approach [50], which turns out to be
a particular case of our framework.
◦ In Section 3.2, we show how the loss maps associated with training are also
parametric lenses. Our approach covers the cases of quadratic error, Boolean
error, Softmax cross entropy, but also the ‘dot product loss’ associated with
the phenomenon of deep dreaming [19, 34, 35, 44].
◦ In Section 3.3, we model the learning rate as a parametric lens. This
analysis also allows us to contrast how learning rate is handled in the ‘real-
valued’ case of neural networks with respect to the ‘Boolean-valued’ case of
Boolean circuits.
◦ In Section 3.4, we show how optimisers can be modelled as ‘reparame-
terisations’ of models as parametric lenses. As case studies, in addition to
basic gradient update, we consider the stateful variants: Momentum [37],
Nesterov Momentum [48], Adagrad [20], and Adam (Adaptive Moment Es-
timation) [32]. Also, on Boolean circuits, we show how the reverse derivative
ascent of [50] can be also regarded in such way.
– In Section 4, we study how the composition of the lenses defined in Section 3
yields a description of different kinds of learning processes.
6 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

◦ Section 4.1 is dedicated to modelling supervised learning of parameters,


in the way described in Figure 1. This amounts essentially to study of
the composite of lenses expressed in Figure 2, for different choices of the
various components. In particular we look at (i) quadratic loss with basic
gradient descent, (ii) softmax cross entropy loss with basic gradient descent,
(iii) quadratic loss with Nesterov momentum, and (iv) learning in Boolean
circuits with XOR loss and basic gradient ascent.
◦ In order to showcase the flexibility of our approach, in Section 4.2 we de-
part from our ‘core’ case study of parameter learning, and turn attention
to supervised learning of inputs, also called deep dreaming — the idea
behind this technique is that, instead of the network parameters, one up-
dates the inputs, in order to elicit a particular interpretation [19, 34, 35, 44].
Deep dreaming can be easily expressed within our approach, with a differ-
ent rearrangement of the parametric lenses involved in the learning process,
see (8) below. The abstract viewpoint of categorical semantics provides a
mathematically precise and visually captivating description of the differ-
ences between the usual parameter learning process and deep dreaming.
– In Section 5 we describe a proof-of-concept Python implementation, avail-
able at [17], based on the theory developed in this paper. This code is intended
to show more concretely the payoff of our approach. Model architectures, as
well as the various components participating in the learning process, are now
expressed in a uniform, principled mathematical language, in terms of lenses.
As a result, computing network gradients is greatly simplified, as it amounts
to lens composition. Moreover, the modularity of this approach allows one to
more easily tune the various parameters of training.
We show our library via a number of experiments, and prove correctness by
achieving accuracy on par with an equivalent model in Keras, a mainstream
deep learning framework [11]. In particular, we create a working non-trivial
neural network model for the MNIST image-classification problem [33].
– Finally, in Sections 6 and 7, we discuss related and future work.

2 Categorical Toolkit

In this section we describe the three categorical components of our framework,


each corresponding to an aspect of gradient-based learning: (I) the Para con-
struction (Section 2.1), which builds a category of parametric maps, (II) the
Lens construction, which builds a category of “bidirectional” maps (Section
2.2), and (III) the combination of these two constructions into the notion of
“parametric lenses” (Section 2.3). Finally (IV) we recall Cartesian reverse dif-
ferential categories — categories equipped with an abstract gradient operator.

Notation We shall use f ; g for sequential composition of morphisms f : A → B


and g : B → C in a category, 1A for the identity morphism on A, and I for the
unit object of a symmetric monoidal category.
Categorical Foundations of Gradient-Based Learning 7

2.1 Parametric Maps

In supervised learning one is typically interested in approximating a function


g : Rn → Rm for some n and m. To do this, one begins by building a neural
network, which is a smooth map f : Rp × Rn → Rm where Rp is the set of
possible weights of that neural network. Then one looks for a value of q ∈ Rp
such that the function f (q, −) : Rn → Rm closely approximates g. We formalise
these maps categorically via the Para construction [9, 23, 24, 30].

Definition 1 (Parametric category). Let (C, ⊗, I) be a strict4 symmetric


monoidal category. We define a category Para(C) with objects those of C, and
a map from A to B a pair (P, f ), with P an object of C and f : P ⊗ A →
B. The composite of maps (P, f ) : A → B and (P ′ , f ′ ) : B → C is the pair
(P ′ ⊗ P, (1P ′ ⊗ f ); f ′ ). The identity on A is the pair (I, 1A ).

Example 1. Take the category Smooth whose objects are natural numbers and
whose morphisms f : n → m are smooth maps from Rn to Rm . As described
above, the category Para(Smooth) can be thought of as a category of neural
networks: a map in this category from n to m consists of a choice of p and a
map f : Rp × Rn → Rm with Rp representing the set of possible weights of the
neural network.

As we will see in the next sections, the interplay of the various components
at work in the learning process becomes much clearer once represented the mor-
phisms of Para(C) using the pictorial formalism of string diagrams, which we
now recall. In fact, we will mildly massage the traditional notation for string
diagrams (below left), by representing a morphism f : A → B in Para(C) as
below right.
P
P
f B A f B
A

This is to emphasise the special role played by P , reflecting the fact that in
machine learning data and parameters have different semantics. String diagram-
matic notations also allows to neatly represent composition of maps (P, f ) : A →
B and (P ′ , f ′ ) : B → C (below left), and “reparameterisation” of (P, f ) : A → B
by a map α : Q → P (below right), yielding a new map (Q, (α⊗1A ); f ) : A → B.

P P′ α
(2)
P
B
A f f′ C A f B

4
One can also define Para(C) in the case when C is non-strict; however, the result
would be not a category but a bicategory.
8 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

Intuitively, reparameterisation changes the parameter space of (P, f ) : A → B to


some other object Q, via some map α : Q → P . We shall see later that gradient
descent and its many variants can naturally be viewed as reparameterisations.
Note coherence rules in combining the two operations in (2) just work as ex-
pected, as these diagrams can be ultimately ‘compiled’ down to string diagrams
for monoidal categories.

2.2 Lenses

In machine learning (or even learning in general) it is fundamental that infor-


mation flows both forwards and backwards: the ‘forward’ flow corresponds to a
model’s predictions, and the ‘backwards’ flow to corrections to the model. The
category of lenses is the ideal setting to capture this type of structure, as it is a
category consisting of maps with both a “forward” and a “backward” part.

Definition 2. For any Cartesian category C, the category of (bimorphic) lenses


in C, Lens(C), is the category with the following data. Objects are pairs (A, A′ )
of objects in C. A map from (A, A′ ) to (B, B ′ ) consists of a pair (f, f ∗ ) where
f : A → B (called the get or forward part of the lens) and f ∗ : A × B ′ →
A′ (called the put or backwards part of the lens). The composite of (f, f ∗ ) :
(A, A′ ) → (B, B ′ ) and (g, g ∗ ) : (B, B ′ ) → (C, C ′ ) is given by get f ; g and put
⟨π0 , ⟨π0 ; f, π1 ⟩; g ∗ ⟩; f ∗ . The identity on (A, A′ ) is the pair (1A , π1 ).

The embedding of Lens(C) into the category of Tambara modules over C


(see [7, Thm. 23]) provides a rich string diagrammatic language, in which lenses
may be represented with forward/backward wires indicating the information
flow. In this language, a morphism (f, f ∗ ) : (A, A′ ) → (B, B ′ ) is written as
below left, which can be ‘expanded’ as below right.

f B
A
A ∗ B
(f, f )
A′ B′
A′ f∗
B′

It is clear in this language how to describe the composite of (f, f ∗ ) : (A, A′ ) →


(B, B ′ ) and (g, g ∗ ) : (B, B ′ ) → (C, C ′ ):

f B g C
A
(3)
′ ∗ ∗
A f g
B′ C′
Categorical Foundations of Gradient-Based Learning 9

2.3 Parametric Lenses

The fundamental category where supervised learning takes place is the composite
Para(Lens(C)) of the two constructions in the previous sections:

Definition 3. The category Para(Lens(C)) of parametric lenses on C has


as objects pairs (A, A′ ) of objects from C. A morphism from (A, A′ ) to (B, B ′ ),
called a parametric lens5 , is a choice of parameter pair (P, P ′ ) and a lens (f, f ∗ ) :
(P, P ′ ) × (A, A′ ) → (B, B ′ ) so that f : P × A → B and f ∗ : P × A × B ′ → P ′ × A′
String diagrams for parametric lenses are built by simply composing the graph-
ical languages of the previous two sections — see (1), where respectively a mor-
phism, a composition of morphisms, and a reparameterisation are depicted.
Given a generic morphism in Para(Lens(C)) as depicted in (1) on the left,
one can see how it is possible to “learn” new values from f : it takes as input an
input A, a parameter P , and a change B ′ , and outputs a change in A, a value
of B, and a change P ′ . This last element is the key component for supervised
learning: intuitively, it says how to change the parameter values to get the neural
network closer to the true value of the desired function.
The question, then, is how one is to define such a parametric lens given
nothing more than a neural network, ie., a parametric map (P, f ) : A → B.
This is precisely what the gradient operation provides, and its generalization to
categories is explored in the next subsection.

2.4 Cartesian Reverse Differential Categories

Fundamental to all types of gradient-based learning is, of course, the gradient


operation. In most cases this gradient operation is performed in the category of
smooth maps between Euclidean spaces. However, recent work [50] has shown
that gradient-based learning can also work well in other categories; for example,
in a category of boolean circuits. Thus, to encompass these examples in a single
framework, we will work in a category with an abstract gradient operation.

Definition 4. A Cartesian left additive category [13, Defn. 1] consists of


a category C with chosen finite products (including a terminal object), and an
addition operation and zero morphism in each homset, satisfying various axioms.
A Cartesian reverse differential category (CRDC) [13, Defn. 13] consists
of a Cartesian left additive category C, together with an operation which provides,
for each map f : A → B in C, a map R[f ] : A × B → A satisfying various
axioms.

For f : A → B, the pair (f, R[f ]) forms a lens from (A, A) to (B, B). We
will pursue the idea that R[f ] acts as backwards map, thus giving a means to
“learn”f .
5
In [23], these are called learners. However, in this paper we study them in a much
broader light; see Section 6.
10 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

Note that assigning type A×B → A to R[f ] hides some relevant information:
B-values in the domain and A-values in the codomain of R[f ] do not play the
same role as values of the same types in f : A → B: in R[f ], they really take in a
tangent vector at B and output a tangent vector at A (cf. the definition of R[f ]
in Smooth, Example 2 below). To emphasise this, we will type R[f ] as a map
A × B ′ → A′ (even though in reality A = A′ and B = B ′ ), thus meaning that
(f, R[f ]) is actually a lens from (A, A′ ) to (B, B ′ ). This typing distinction will
be helpful later on, when we want to add additional components to our learning
algorithms.
The following two examples of CRDCs will serve as the basis for the learning
scenarios of the upcoming sections.

Example 2. The category Smooth (Example 1) is Cartesian with product given


by addition, and it is also a Cartesian reverse differential category: given a
smooth map f : Rn → Rm , the map R[f ] : Rn × Rm → Rn sends a pair (x, v)
to J[f ]T (x) · v: the transpose of the Jacobian of f at x in the direction v. For
example, if f : R2 → R3 is defined as f (x1 , x2 ) := (x31 + 2x1 x2 , x2 , sin(x
  1 )), then
 2  v1
3x1 + 2x2 0 cos(x1 )  
R[f ] : R2 × R3 → R2 is given by (x, v) 7→ · v2 . Using
2x1 1 0
v3
the reverse derivative (as opposed to the forward derivative) is well-known to be
much more computationally efficient for functions f : Rn → Rm when m ≪ n
(for example, see [28]), as is the case in most supervised learning situations
(where often m = 1).

Example 3. Another CRDC is the symmetric monoidal category POLY Z2 [13,


Example 14] with objects the natural numbers and morphisms f : A → B the B-
tuples of polynomials Z2 [x1 . . . xA ]. When presented by generators and relations
these morphisms can be viewed as a syntax for boolean circuits, with parametric
lenses for such circuits (and their reverse derivative) described in [50].

3 Components of learning as Parametric Lenses


As seen in the introduction, in the learning process there are many components
at work: a model, an optimiser, a loss map, a learning rate, etc. In this section
we show how each such component can be understood as a parametric lens.
Moreover, for each component, we show how our framework encompasses several
variations of the gradient-descent algorithms, thus offering a unifying perspective
on many different approaches that appear in the literature.

3.1 Models as Parametric Lenses


We begin by characterising the models used for training as parametric lenses.
In essence, our approach identifies a set of abstract requirements necessary to
perform training by gradient descent, which covers the case studies that we will
consider in the next sections.
Categorical Foundations of Gradient-Based Learning 11

The leading intuition is that a suitable model is a parametric map, equipped


with a reverse derivative operator. Using the formal developments of Section 2,
this amounts to assuming that a model is a morphism in Para(C), for a CRDC
C. In order to visualise such morphism as a parametric lens, it then suffices to
apply under Para(−) the canonical morphism R : C → Lens(C) (which exists
for any CRDC C, see [13, Prop. 31]), mapping f to (f, R[f ]). This yields a functor
Para(R) : Para(C) → Para(Lens(C)), pictorially defined as

P P′

P f B

A
A f B 7→ (4)
A′ R[f ]
B′

Example 4 (Neural networks). As noted previously, to learn a function of type


Rn → Rm , one constructs a neural network, which can be seen as a function of
type Rp × Rn → Rm where Rp is the space of parameters of the neural network.
As seen in Example 1, this is a map in the category Para(Smooth) of type
Rn → Rm with parameter space Rp . Then one can apply the functor in (4)
to present a neural network together with its reverse derivative operator as a
parametric lens, i.e. a morphism in Para(Lens(Smooth)).

Example 5 (Boolean circuits). For learning of Boolean circuits as described in


[50], the recipe is the same as in Example 4, except that the base category is
POLYZ2 (see Example 3). The important observation here is that POLY Z2 is a
CRDC, see [13, 50], and thus we can apply the functor in (4).

Note a model/parametric lens f can take as inputs an element of A, an


element of B ′ (a change in B) and a parameter P and outputs an element of
B, a change in A, and a change in P . This is not yet sufficient to do machine
learning! When we perform learning, we want to input a parameter P and a pair
A × B and receive a new parameter P . Instead, f expects a change in B (not an
element of B) and outputs a change in P (not an element of P ). Deep dreaming,
on the other hand, wants to return an element of A (not a change in A). Thus, to
do machine learning (or deep dreaming) we need to add additional components
to f ; we will consider these additional components in the next sections.

3.2 Loss Maps as Parametric Lenses


Another key component of any learning algorithm is the choice of loss map.
This gives a measurement of how far the current output of the model is from
the desired output. In standard learning in Smooth, this loss map is viewed as
a map of type B × B → R. However, in our setup, this is naturally viewed as a
12 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

parametric map from B to R with parameter space B.6 We also generalize the
codomain to an arbitrary object L.

Definition 5. A loss map on B consists of a parametric map (B, loss) :


Para(C)(B, L) for some object L.

Note that we can precompose a loss map (B, loss) : B → L with a neural
network (P, f ) : A → B (below left), and apply the functor in (4) (with C =
Smooth) to obtain the parametric lens below right.

P P′ B B′
P B B
A f loss L (5)
B 7→
A f loss L A′ R[f ] R[loss] L′
B′

This is getting closer to the parametric lens we want: it can now receive
inputs of type B. However, this is at the cost of now needing an input to L′ ; we
consider how to handle this in the next section.

Example 6 (Quadratic error). In Smooth, the standard loss function on Rb is


quadratic error:P it uses L = R and has parametric map e : Rb × Rb → R given
1 b
by e(bt , bp ) = 2 i=1 ((bp )i −(bt )i )2 , where we think of bt as the “true” value and
bp the predicted value. This has reverse derivative R[e] : Rb × Rb × R → Rb × Rb
given by R[e](bt , bp , α) = α · (bp − bt , bt − bp ) — note α suggests the idea of
learning rate, which we will explore in Section 3.3.

Example 7 (Boolean error). In POLYZ2 , the loss function on Zb which is im-


plicitly used in [50] is a bit different: it uses L = Zb and has parametric map
e : Zb × Zb → Zb given by
e(bt , bp ) = bt + bp .
(Note that this is + in Z2 ; equivalently this is given by XOR.) Its reverse deriva-
tive is of type R[e] : Zb × Zb × Zb → Zb × Zb given by R[e](bt , bp , α) = (α, α).

Example 8 (Softmax cross entropy). The Softmax cross entropy loss is a Rb -


Pb
parametric map Rb → R defined by e(bt , bp ) = i=1 (bt )i ((bp )i −log(Softmax(bp )i ))
exp((bp )i )
where Softmax(bp ) = Pb exp((b ) )
is defined componentwise for each class i.
j=1 p j

We note that, although bt needs to be a probability distribution, at the


moment there is no need to ponder the question of interaction of probability
distributions with the reverse derivative framework: one can simply consider bt
as the image of some logits under the Softmax function.
6
Here the loss map has its parameter space equal to its input space. However, putting
loss maps on the same footing as models lends itself to further generalizations where
the parameter space is different, and where the loss map can itself be learned. See
Generative Adversarial Networks, [9, Figure 7.].
Categorical Foundations of Gradient-Based Learning 13

Example 9 (Dot product). In Deep Dreaming (Section 4.2) we often want to focus
only on a particular element of the network output Rb . This is done by supplying
a one-hot vector bt as the ground truth to the loss function e(bt , bp ) = bt ·bp which
computes the dot product of two vectors. If the ground truth vector y is a one-
hot vector (active at the i-th element), then the dot product performs masking of
all inputs except the i-th one. Note the reverse derivative R[e] : Rb × Rb × R →
Rb × Rb of the dot product is defined as R[e](bt , bp , α) = (α · bp , α · bt ).

3.3 Learning Rates as Parametric Lenses


After models and loss maps, another ingredient of the learning process are learn-
ing rates, which we formalise as follows.
Definition 6. A learning rate α on L consists of a lens from (L, L′ ) to (1, 1)
where 1 is a terminal object in C.
Note that the get component of the learning rate lens must be the unique map
to 1, while the put component is a map L × 1 → L′ ; that is, simply a map
α∗ : L → L′ . Thus we can view α as a parametric lens from (L, L′ ) → (1, 1)
(with trivial parameter space) and compose it in Para(Lens(C)) with a model
and a loss map (cf. (5)) to get
P P′ B B′

B L
A f loss
α (6)
A′ R[f ] R[loss]
B′ L′

Example 10. In standard supervised learning in Smooth, one fixes some ϵ > 0
as a learning rate, and this is used to define α: α is simply constantly −ϵ, ie.,
α(l) = −ϵ for any l ∈ L.
Example 11. In supervised learning in POLY Z2 , the standard learning rate is
quite different: for a given L it is defined as the identity function, α(l) = l.
Other learning rate morphisms are possible as well: for example, one could
fix some ϵ > 0 and define a learning rate in Smooth by α(l) = −ϵ · l. Such a
choice would take into account how far away the network is from its desired goal
and adjust the learning rate accordingly.

3.4 Optimisers as Reparameterisations


In this section we consider how to implement gradient descent (and its variants)
into our framework. To this aim, note that the parametric lens (f, R[f ]) rep-
resenting our model (see (4)) outputs a P ′ , which represents a change in the
parameter space. Now, we would like to receive not just the requested change
in the parameter, but the new parameter itself. This is precisely what gradient
descent accomplishes, when formalised as a lens.
14 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

Definition 7. In any CRDC C we can define gradient update as a map G in


Lens(C) from (P, P ) to (P, P ′ ) consisting of (G, G∗ ) : (P, P ) → (P, P ′ ), where
G(p) = p and G∗ (p, p′ ) = p + p′7 .

Intuitively, such a lens allows one to receive the requested change in parameter
and implement that change by adding that value to the current parameter. By its
type, we can now “plug” the gradient descent lens G : (P, P ) → (P, P ′ ) above the
model (f, R[f ]) in (4) — formally, this is accomplished as a reparameterisation
of the parametric morphism (f, R[f ]), cf. Section 2.1. This gives us Figure 3
(left).

P P S×P S×P

+ Optimiser

P P′ P P′
A B A B

Model Model

A′ B′ A′ B′

Fig. 3: Model reparameterised by basic gradient descent (left) and a generic


stateful optimiser (right).

Example 12 (Gradient update in Smooth). In Smooth, the gradient descent repa-


rameterisation will take the output from P ′ and add it to the current value of
P to get a new value of P .

Example 13 (Gradient update in Boolean circuits). In the CRDC POLY Z2 , the


gradient descent reparameterisation will again take the output from P ′ and
add it to the current value of P to get a new value of P ; however, since + in
Z2 is the same as XOR, this can be also be seen as taking the XOR of the
current parameter and the requested change; this is exactly how this algorithm
is implemented in [50].

Other variants of gradient descent also fit naturally into this framework by
allowing for additional input/output data with P . In particular, many of them
keep track of the history of previous updates and use that to inform the next one.
This is easy to model in our setup: instead of asking for a lens (P, P ) → (P, P ′ ),
we ask instead for a lens (S ×P, S ×P ) → (P, P ′ ) where S is some “state” object.
7
Note that as in the discussion in Section 2.4, we are implicitly assuming that P = P ′ ;
we have merely notated them differently to emphasize the different “roles” they play
(the first P can be thought of as “points”, the second as “vectors”)
Categorical Foundations of Gradient-Based Learning 15

Definition 8. A stateful parameter update consists of a choice of object S


(the state object) and a lens U : (S × P, S × P ) → (P, P ′ ).
Again, we view this optimiser as a reparameterisation which may be “plugged
in” a model as in Figure 3 (right). Let us now consider how several well-known
optimisers can be implemented in this way.
Example 14 (Momentum). In the momentum variant of gradient descent, one
keeps track of the previous change and uses this to inform how the current
parameter should be changed. Thus, in this case, we set S = P , fix some γ >
0, and define the momentum lens (U, U ∗ ) : (P × P, P × P ) → (P, P ′ ) . by
U (s, p) = p and U ∗ (s, p, p′ ) = (s′ , p + s′ ), where s′ = −γs + p′ . Note momentum
recovers gradient descent when γ = 0.
In both standard gradient descent and momentum, our lens representation
has trivial get part. However, as soon as we move to more complicated variants,
this is not anymore the case, as for instance in Nesterov momentum below.
Example 15 (Nesterov momentum). In Nesterov momentum, one uses the mo-
mentum from previous updates to tweak the input parameter supplied to the
network. We can precisely capture this by using a small variation of the lens in
the previous example. Again, we set S = P , fix some γ > 0, and define the Nes-
terov momentum lens (U, U ∗ ) : (P × P, P × P ) → (P, P ′ ) by U (s, p) = p + γs
and U ∗ as in the previous example.
Example 16 (Adagrad). Given any fixed ϵ > 0 and δ ∼ 10−7 , Adagrad [20] is
given by S = P , with the lens whose get part is (g, p) 7→ p. The put is (g, p, p′ ) 7→
(g ′ , p + δ+√
ϵ
g′
⊙ p′ ) where g ′ = g + p′ ⊙ p′ and ⊙ is the elementwise (Hadamard)
product. Unlike with other optimization algorithms where the learning rate is
the same for all parameters, Adagrad divides the learning rate of each individual
parameter with the square root of the past accumulated gradients.
Example 17 (Adam). Adaptive Moment Estimation (Adam) [32] is another method
that computes adaptive learning rates for each parameter by storing exponen-
tially decaying average of past gradients (m) and past squared gradients (v). For
fixed β1 , β2 ∈ [0, 1), ϵ > 0, and δ ∼ 10−8 , Adam is given by S = P × P , with
the lens whose get part is (m, v, p) 7→ p and whose put part is put(m, v, p, p′ ) =
(mb ′ , vb′ , p + δ+√
ϵ
b′
v
b ′ ) where m′ = β1 m + (1 − β1 )p′ , v ′ = β2 v + (1 − β2 )p′2 ,
⊙m
m′ v′
b′ =
and m 1−β1t
, vb′ = 1−β2t
.

Note that, so far, optimsers/reparameterisations have been added to the


P/P ′ wires. In order to change the model’s parameters (Fig. 3). In Section 4.2
we will study them on the A/A′ wires instead, giving deep dreaming.

4 Learning with Parametric Lenses


In the previous section we have seen how all the components of learning can be
modeled as parametric lenses. We now study how all these components can be
16 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

put together to form supervised learning systems. In addition to studying the


most common examples of supervised learning: systems that learn parameters,
we also study different kinds systems: those that learn their inputs. This is a
technique commonly known as deep dreaming, and we present it as a natural
counterpart of supervised learning of parameters.
Before we describe these systems, it will be convenient to represent all the
inputs and outputs of our parametric lenses as parameters. In (6), we see the
P/P ′ and B/B ′ inputs and outputs as parameters; however, the A/A′ wires are
not. To view the A/A′ inputs as parameters, we compose that system with the
parametric lens η we now define. The parametric lens η has the type (1, 1) →
(A, A′ ) with parameter space (A, A′ ) defined by (getη = 1A , putη = π1 ) and can
A
A
be depicted graphically as . Composing η with the rest of the learning
A′
system in (6) gives us the closed parametric lens

A A′ P P′ B B′
A B L
Model Loss α (7)
A′ B′ L′

This composite is now a map in Para(Lens(C)) from (1, 1) to (1, 1); all its inputs
and outputs are now vertical wires, ie., parameters. Unpacking it further, this is
a lens of type (A × P × B, A′ × P ′ × B ′ ) → (1, 1) whose get map is the terminal
map, and whose put map is of the type A × P × B → A′ × P ′ × B ′ . It can be
unpacked as the composite put(a, p, bt ) = (a′ , p′ , b′t ), where

bp = f (p, a) (b′t , b′p ) = R[loss](bt , bp , α(loss(bt , bp ))) (p′ , a′ ) = R[f ](p, a, b′p ).

In the next two sections we consider further additions to the image above which
correspond to different types of supervised learning.

4.1 Supervised Learning of Parameters

The most common type of learning performed on (7) is supervised learning of


parameters. This is done by reparameterising (cf. Section 2.1) the image in the
following manner. The parameter ports are reparameterised by one of the (pos-
sibly stateful) optimisers described in the previous section, while the backward
wires A′ of inputs and B ′ of outputs are discarded. This finally yields the com-
plete picture of a system which learns the parameters in a supervised manner:
Categorical Foundations of Gradient-Based Learning 17

A S×P S×P
B

Optimiser

P P′
B′
A B L

Model Loss α
′ ′ ′
A B L

Fixing a particular optimiser (U, U ) : (S × P, S × P ) → (P, P ′ ) we again


unpack the entire construction. This is a map in Para(Lens(C)) from (1, 1) to


(1, 1) whose parameter space is (A × S × P × B, S × P ). In other words, this
is a lens of type (A × S × P × B, S × P ) → (1, 1) whose get component is the
terminal map. Its put map has the type A × S × P × B → S × P and unpacks
to put(a, s, p, bt ) = U ∗ (s, p, p′ ), where

p = U (s, p) bp = f (p, a)
(b′t , b′p ) = R[loss](bt , bp , α(loss(bt , bp ))) (p′ , a′ ) = R[f ](p, a, b′p ).

While this formulation might seem daunting, we note that it just explicitly
specifies the computation performed by a supervised learning system. The vari-
able p represents the parameter supplied to the network by the stateful gradient
update rule (in many cases this is equal to p); bp represents the prediction of
the network (contrast this with bt which represents the ground truth from the
dataset). Variables with a tick ′ represent changes: b′p and b′t are the changes
on predictions and true values respectively, while p′ and a′ are changes on the
parameters and inputs. Furthermore, this arises automatically out of the rule for
lens composition (3); what we needed to specify is just the lenses themselves.
We justify and illustrate our approach on a series of case studies drawn from
the literature. This presentation has the advantage of treating all these instances
uniformly in terms of basic constructs, highlighting their similarities and differ-
ences. First, we fix some parametric map (Rp , f ) : Para(Smooth)(Ra , Rb ) in
Smooth and the constant negative learning rate α : R (Example 10). We then
vary the loss function and the gradient update, seeing how the put map above
reduces to many of the known cases in the literature.

Example 18 (Quadratic error, basic gradient descent). Fix the quadratic error
(Example 6) as the loss map and basic gradient update (Example 12). Then the
aforementioned put map simplifies. Since there is no state, its type reduces to
A × P × B → P , and we have put(a, p, bt ) = p + p′ , where (p′ , a′ ) = R[f ](p, a, α ·
(f (p, a) − bt )). Note that α here is simply a constant, and due to the linearity
of the reverse derivative (Def 4), we can slide the α from the costate into the
basic gradient update lens. Rewriting this update, and performing this sliding we
obtain a closed form update step put(a, p, bt ) = p+α·(R[f ](p, a, f (p, a)−bt ); π0 ),
18 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

where the negative descent component of gradient descent is here contained in


the choice of the negative constant α.

This example gives us a variety of regression algorithms solved iteratively


by gradient descent: it embeds some parametric map (Rp , f ) : Ra → Rb into the
system which performs regression on input data - where a denotes the input to
the model and bt denotes the ground truth. If the corresponding f is linear and
b = 1, we recover simple linear regression with gradient descent. If the codomain
is multi-dimensional, i.e. we are predicting multiple scalars, then we recover
multivariate linear regression. Likewise, we can model a multi-layer perceptron or
even more complex neural network architectures performing supervised learning
of parameters simply by changing the underlying parametric map.

Example 19 (Softmax cross entropy, basic gradient descent). Fix Softmax cross
entropy (Example 8) as the loss map and basic gradient update (Example 12).
Again the put map simplifies. The type reduces to A × P × B → P and we have
put(a, p, bt ) = p + p′ where (p′ , a′ ) = R[f ](p, a, α · (Softmax(f (p, a)) − bt )). The
same rewriting performed on the previous example can be done here.

This example recovers logistic regression, e.g. classification.

Example 20 (Mean squared error, Nesterov Momentum). Fix the quadratic error
(Example 6) as the loss map and Nesterov momentum (Example 15) as the
gradient update. This time the put map A × S × P × B → S × P does not have a
simplified type. The implementation of put reduces to put(a, s, p, bt ) = (s′ , p+s′ ),
where p = p + γs, (p′ , a′ ) = R[f ](p, a, α · (f (p, a) − bt )), and s′ = −γs + p′ .

This example with Nesterov momentum differs in two key points from all
the other ones: i) the optimiser is stateful, and ii) its get map is not trivial.
While many other optimisers are stateful, the non-triviality of the get map here
showcases the importance of lenses. They allow us to make precise the notion of
computing a “lookahead” value for Nesterov momentum, something that is in
practice usually handled in ad-hoc ways. Here, the algebra of lens composition
handles this case naturally by using the get map, a seemingly trivial, unused
piece of data for previous optimisers.
Our last example, using a different base category POLY Z2 , shows that our
framework captures learning in not just continuous, but discrete settings too.
Again, we fix a parametric map (Zp , f ) : POLYZ2 (Za , Zb ) but this time we fix
the identity learning rate (Example 11), instead of a constant one.

Example 21 (Basic learning in Boolean circuits). Fix XOR as the loss map (Ex-
ample 7) and the basic gradient update (Example 13). The put map again
simplifies. The type reduces to A × P × B → P and the implementation to
put(a, p, bt ) = p + p′ where (p′ , a′ ) = R[f ](p, a, f (p, a) + bt ).

A sketch of learning iteration. Having described a number of examples in


supervised learning, we outline how to model learning iteration in our framework.
Recall the aforementioned put map whose type is A × P × B → P (for simplicity
Categorical Foundations of Gradient-Based Learning 19

here modelled without state S). This map takes an input-output pair (a0 , b0 ),
the current parameter pi and produces an updated parameter pi+1 . At the next
time step, it takes a potentially different input-output pair (a1 , b1 ), the updated
parameter pi+1 and produces pi+2 . This process is then repeated. We can model
this iteration as a composition of the put map with itself, as a composite (A ×
put × B); put whose type is A × A × P × B × B → P . This map takes two input-
output pairs A × B, a parameter and produces a new parameter by processing
these datapoints in sequence. One can see how this process can be iterated any
number of times, and even represented as a string diagram.
But we note that with a slight reformulation of the put map, it is possible
to obtain a conceptually much simpler definition. The key insight lies in seeing
that the map put : A × P × B → P is essentially an endo-map P → P with some
extra inputs A × B; it’s a parametric map!
In other words, we can recast the put map as a parametric map (A × B, put) :
Para(C)(P, P ). Being an endo-map, it can be composed with itself. The resulting
composite is an endo-map taking two “parameters”: input-output pair at the
time step 0 and time step 1. This process can then be repeated, with Para
composition automatically taking care of the algebra of iteration.

A×B A×B A×B

P
P put put . n. . put P

This reformulation captures the essence of parameter iteration: one can think
of it as a trajectory pi , pi+1 , pi+2 , ... through the parameter space; but it is a
trajectory parameterised by the dataset. With different datasets the algorithm
will take a different path through this space and learn different things.

4.2 Deep Dreaming: Supervised Learning of Inputs

We have seen that reparameterising the parameter port with gradient descent
allows us to capture supervised parameter learning. In this section we describe
how reparameterising the input port provides us with a way to enhance an input
image to elicit a particular interpretation. This is the idea behind the technique
called Deep Dreaming, appearing in the literature in many forms [19, 34, 35, 44].

S×A S×A P B

Optimiser

A A′
B′
A B L

Model Loss α (8)


A′ B′ L′
20 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

Deep dreaming is a technique which uses the parameters p of some trained


classifier network to iteratively dream up, or amplify some features of a class b on
a chosen input a. For example, if we start with an image of a landscape a0 , a label
b of a “cat” and a parameter p of a sufficiently well-trained classifier, we can start
performing “learning” as usual: computing the predicted class for the landscape
a0 for the network with parameters p, and then computing the distance between
the prediction and our label of a cat b. When performing backpropagation, the
respective changes computed for each layer tell us how the activations of that
layer should have been changed to be more “cat” like. This includes the first
(input) layer of the landscape a0 . Usually, we discard this changes and apply
gradient update to the parameters. In deep dreaming we discard the parameters
and apply gradient update to the input (see (8)). Gradient update here takes these
changes and computes a new image a1 which is the same image of the landscape,
but changed slightly so to look more like whatever the network thinks a cat looks
like. This is the essence of deep dreaming, where iteration of this process allows
networks to dream up features and shapes on a particular chosen image [1].
Just like in the previous subsection, we can write this deep dreaming system
as a map in Para(Lens(C)) from (1, 1) to (1, 1) whose parameter space is (S×A×
P ×B, S ×A). In other words, this is a lens of type (S ×A×P ×B, S ×A) → (1, 1)
whose get map is trivial. Its put map has the type S × A × P × B → S × A
and unpacks to put(s, a, p, bt ) = U ∗ (s, a, a′ ), where a = U (s, a), bp = f (p, a),
(b′t , b′p ) = R[loss](bt , bp , α(loss(bt , bp ))), and (p′ , a′ ) = R[f ](p, a, b′p ).
We note that deep dreaming is usually presented without any loss function as
a maximisation of a particular activation in the last layer of the network output
[44, Section 2.]. This maximisation is done with gradient ascent, as opposed to
gradient descent. However, this is just a special case of our framework where
the loss function is the dot product (Example 9). The choice of the particular
activation is encoded as a one-hot vector, and the loss function in that case
essentially masks the network output, leaving active only the particular chosen
activation. The final component is the gradient ascent: this is simply recovered
by choosing a positive, instead of a negative learning rate [44]. We explicitly
unpack this in the following example.
Example 22 (Deep dreaming, dot product loss, basic gradient update). Fix Smooth
as base category, a parametric map (Rp , f ) : Para(Smooth)(Ra , Rb ), the dot
product loss (Example 9), basic gradient update (Example 12), and a positive
learning rate α : R. Then the above put map simplifies. Since there is no state, its
type reduces to A × P × B → A and its implementation to put(a, p, bt ) = a + a′ ,
where (p′ , a′ ) = R[f ](p, a, α · bt ). Like in Example 18, this update can be rewrit-
ten as put(a, p, bt ) = a + α · (R[f ](p, a, bt ); π1 ), making a few things apparent.
This update does not depend on the prediction f (p, a): no matter what the net-
work has predicted, the goal is always to maximize particular activations. Which
activations? The ones chosen by bt . When bt is a one-hot vector, this picks out
the activation of just one class to maximize, which is often done in practice.
While we present only the most basic image, there is plenty of room left
for exploration. The work of [44, Section 2.] adds an extra regularization term
Categorical Foundations of Gradient-Based Learning 21

to the image. In general, the neural network f is sometimes changed to copy


a number of internal activations which are then exposed on the output layer.
Maximizing all these activations often produces more visually appealing results.
In the literature we did not find an example which uses the Softmax-cross entropy
(Example 8) as a loss function in deep dreaming, which seems like the more
natural choice in this setting. Furthermore, while deep dreaming commonly uses
basic gradient descent, there is nothing preventing the use of any of the optimiser
lenses discussed in the previous section, or even doing deep dreaming in the
context of Boolean circuits. Lastly, learning iteration which was described in at
the end of previous subsection can be modelled here in an analogous way.

5 Implementation

We provide a proof-of-concept implementation as a Python library — full usage


examples, source code, and experiments can be found at [17]. We demonstrate
the correctness of our library empirically using a number of experiments im-
plemented both in our library and in Keras [11], a popular framework for deep
learning. For example, one experiment is a model for the MNIST image clas-
sification problem [33]: we implement the same model in both frameworks and
achieve comparable accuracy. Note that despite similarities between the user in-
terfaces of our library and of Keras, a model in our framework is constructed
as a composition of parametric lenses. This is fundamentally different to the
approach taken by Keras and other existing libraries, and highlights how our
proposed algebraic structures naturally guide programming practice
In summary, our implementation demonstrates the advantages of our ap-
proach. Firstly, computing the gradients of the network is greatly simplified
through the use of lens composition. Secondly, model architectures can be ex-
pressed in a principled, mathematical language; as morphisms of a monoidal
category. Finally, the modularity of our approach makes it easy to see how var-
ious aspects of training can be modified: for example, one can define a new
optimization algorithm simply by defining an appropriate lens. We now give a
brief sketch of our implementation.

5.1 Constructing a Model with Lens and Para

We model a lens (f, f ∗ ) in our library with the Lens class, which consists of a
pair of maps fwd and rev corresponding to f and f ∗ , respectively. For example,
we write the identity lens (1A , π2 ) as follows:
i d e n t i t y = Lens ( lambda x : x , lambda x dy : x dy [ 1 ] )

The composition (in diagrammatic order) of Lens values f and g is written


f >> g, and monoidal composition as f @ g. Similarly, the type of Para maps
is modeled by the Para class, with composition and monoidal product written
the same way. Our library provides several primitive Lens and Para values.
22 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

Let us now see how to construct a single layer neural network from the com-
position of such primitives. Diagramatically, we wish to construct the following
model, representing a single ‘dense’ layer of a neural network:

Rb×a Rb×a Rb Rb

Rb Rb
Ra Rb
linear bias activation (9)
Ra Rb
Rb Rb
Here, the parameters of linear are the coefficients of a b × a matrix, and the
underlying lens has as its forward map the function (M, x) → M · x, where M is
the b × a matrix whose coefficients are the Rb×a parameters, and x ∈ Ra is the
input vector. The bias map is even simpler: the forward map of the underlying
lens is simply pointwise addition of inputs and parameters: (b, x) → b+x. Finally,
the activation map simply applies a nonlinear function (e.g., sigmoid) to the
input, and thus has the trivial (unit) parameter space. The representation of
this composition in code is straightforward: we can simply compose the three
primitive Para maps as in (9):
def d e n s e ( a , b , a c t i v a t i o n ) :
return l i n e a r ( a , b ) >> b i a s ( b ) >> a c t i v a t i o n

Note that by constructing model architectures in this way, the computation


of reverse derivatives is greatly simplified: we obtain the reverse derivative ‘for
free’ as the put map of the model. Furthermore, adding new primitives is also
simplified: the user need simply provide a function and its reverse derivative in
the form of a Para map. Finally, notice also that our approach is truly composi-
tional: we can define a hidden layer neural network with n hidden units simply
by composing two dense layers, as follows:
d e n s e ( a , n , a c t i v a t i o n ) >> d e n s e ( n , b , a c t i v a t i o n )

5.2 Learning
Now that we have constructed a model, we also need to use it to learn from
data. Concretely, we will construct a full parametric lens as in Figure 2 then
extract its put map to iterate over the dataset.
By way of example, let us see how to construct the following parametric lens,
representing basic gradient descent over a single layer neural network with a
fixed learning rate:
A P P B

P P′
B′
A B L
dense loss (10)
ϵ
A′ B′ L′
Categorical Foundations of Gradient-Based Learning 23

This morphism is constructed essentially as below, where apply update(α,


f ) represents the ‘vertical stacking’ of α atop f :
a p p l y u p d a t e ( b a s i c u p d a t e , d e n s e ) >> l o s s >> l e a r n i n g r a t e ( ϵ )

Now, given the parametric lens of (10), one can construct a morphism step :
B ×P ×A → P which is simply the put map of the lens. Training the model then
consists of iterating the step function over dataset examples (x, y) ∈ A×B to op-
timise some initial choice of parameters θ0 ∈ P , by letting θi+1 = step(yi , θi , xi ).
Note that our library also provides a utility function to construct step from
its various pieces:
s t e p = s u p e r v i s e d s t e p ( model , update , l o s s , l e a r n i n g r a t e )

For an end-to-end example of model training and iteration, we refer the


interested reader to the experiments accompanying the code [17].

6 Related Work
The work [23] is closely related to ours, in that it provides an abstract categorical
model of backpropagation. However, it differs in a number of key aspects. We
give a complete lens-theoretic explanation of what is back-propagated via (i)
the use of CRDCs to model gradients; and (ii) the Para construction to model
parametric functions and parameter update. We thus can go well beyond [23]
in terms of examples - their example of smooth functions and basic gradient
descent is covered in our subsection 4.1.
We also explain some of the constructions of [23] in a more structured way.
For example, rather than considering the category Learn of [23] as primitive,
here we construct it as a composite of two more basic constructions (the Para
and Lens constructions). The flexibility could be used, for example, to com-
positionally replace Para with a variant allowing parameters to come from a
different category, or lenses with the category of optics [38] enabling us to model
things such as control flow using prisms.
One more relevant aspect is functoriality. We use a functor to augment a
parametric map with its backward pass, just like [23]. However, they additionally
augmented this map with a loss map and gradient descent using a functor as
well. This added extra conditions on the partial derivatives of the loss function:
it needed to be invertible in the 2nd variable. This constraint was not justified
in [23], nor is it a constraint that appears in machine learning practice. This led
us to reexamine their constructions, coming up with our reformulation that does
not require it. While loss maps and optimisers are mentioned in [23] as parts of
the aforementioned functor, here they are extracted out and play a key role: loss
maps are parametric lenses and optimisers are reparameterisations. Thus, in this
paper we instead use Para-composition to add the loss map to the model, and
Para 2-cells to add optimisers. The mentioned inverse of the partial derivative
of the loss map in the 2nd variable was also hypothesised to be relevant to deep
dreaming. We have investigated this possibility thoroughly in our paper, showing
24 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

it is gradient update which is used to dream up pictures. We also correct a small


issue in Theorem III.2 of [23]. There, the morphisms of Learn were defined up to
an equivalence (pg. 4 of [23]) but, unfortunately, the functor defined in Theorem
III.2 does not respect this equivalence relation. Our approach instead uses 2-cells
which comes from the universal property of Para — a 2-cell from (P, f ) : A → B
to (Q, g) : A → B is a lens, and hence has two components: a map α : Q → P
and α∗ : Q × P → Q. By comparison, we can see the equivalence relation of [23]
as being induced by map α : Q → P , and not a lens. Our approach highlights
the importance of the 2-categorical structure of learners. In addition, it does not
treat the functor Para(C) → Learn as a primitive. In our case, this functor
has the type Para(C) → Para(Lens(C)) and arises from applying Para to a
canonical functor C → Lens(C) existing for any reverse derivative category, not
just Smooth. Lastly, in our paper we took advantage of the graphical calculus
for Para, redrawing many diagrams appearing in [23] in a structured way.
Other than [23], there are a few more relevant papers. The work of [18] con-
tains a sketch of some of the ideas this paper evolved from. They are based
on the interplay of optics with parameterisation, albeit framed in the setting of
diffeological spaces, and requiring cartesian and local cartesian closed structure
on the base category. Lenses and Learners are studied in the eponymous work
of [22] which observes that learners are parametric lenses. They do not explore
any of the relevant Para or CRDC structure, but make the distinction between
symmetric and asymmetric lenses, studying how they are related to learners de-
fined in [23]. A lens-like implementation of automatic differentiation is the focus
of [21], but learning algorithms aren’t studied. A relationship between category-
theoretic perspective on probabilistic modeling and gradient-based optimisation
is studied in [42] which also studies a variant of the Para construction. Usage of
Cartesian differential categories to study learning is found in [46]. They extend
the differential operator to work on stateful maps, but do not study lenses, pa-
rameterisation nor update maps. The work of [24] studies deep learning in the
context of Cycle-consistent Generative Adversarial Networks [51] and formalises
it via free and quotient categories, making parallels to the categorical formula-
tions of database theory [45]. They do use the Para construction, but do not
relate it to lenses nor reverse derivative categories. A general survey of category
theoretic approaches to machine learning, covering many of the above papers,
can be found in [43]. Lastly, the concept of parametric lenses has started appear-
ing in recent formulations of categorical game theory and cybernetics [9,10]. The
work of [9] generalises the study of parametric lenses into parametric optics and
connects it to game thereotic concepts such as Nash equilibria.

7 Conclusions and Future Directions

We have given a categorical foundation of gradient-based learning algorithms


which achieves a number of important goals. The foundation is principled and
mathematically clean, based on the fundamental idea of a parametric lens. The
foundation covers a wide variety of examples: different optimisers and loss maps
Categorical Foundations of Gradient-Based Learning 25

in gradient-based learning, different settings where gradient-based learning hap-


pens (smooth functions vs. boolean circuits), and both learning of parameters
and learning of inputs (deep dreaming). Finally, the foundation is more than
a mere abstraction: we have also shown how it can be used to give a practical
implementation of learning, as discussed in Section 5.
There are a number of important directions which are possible to explore
because of this work. One of the most exciting ones is the extension to more
complex neural network architectures. Our formulation of the loss map as a
parametric lens should pave the way for Generative Adversarial Networks [27],
an exciting new architecture whose loss map can be said to be learned in tandem
with the base network. In all our settings we have fixed an optimiser beforehand.
The work of [4] describes a meta-learning approach which sees the optimiser as a
neural network whose parameters and gradient update rule can be learned. This
is an exciting prospect since one can model optimisers as parametric lenses;
and our framework covers learning with parametric lenses. Recurrent neural
networks are another example of a more complex architecture, which has already
been studied in the context of differential categories in [46]. When it comes to
architectures, future work includes modelling some classical systems as well, such
as the Support Vector Machines [15], which should be possible with the usage
of loss maps such as Hinge loss.
Future work also includes using the full power of CRDC axioms. In particular,
axioms RD.6 or RD.7, which deal with the behaviour of higher-order derivatives,
were not exploited in our work, but they should play a role in modelling some
supervised learning algorithms using higher-order derivatives (for example, the
Hessian) for additional optimisations. Taking this idea in a different direction,
one can see that much of our work can be applied to any functor of the form
F : C → Lens(C) - F does not necessarily have to be of the form f 7→ (f, R[f ])
for a CRDC R. Moreover, by working with more generalised forms of the lens
category (such as dependent lenses), we may be able to capture ideas related
to supervised learning on manifolds. And, of course, we can vary the parameter
space to endow it with different structure from the functions we wish to learn. In
this vein, we wish to use fibrations/dependent types to model the use of tangent
bundles: this would foster the extension of the correct by construction paradigm
to machine learning, and thereby addressing the widely acknowledged problem
of trusted machine learning. The possibilities are made much easier by the com-
positional nature of our framework. Another key topic for future work is to link
gradient-based learning with game theory. At a high level, the former takes lit-
tle incremental steps to achieve an equilibrium while the later aims to do so in
one fell swoop. Formalising this intuition is possible with our lens-based frame-
work and the lens-based framework for game theory [25]. Finally, because our
framework is quite general, in future work we plan to consider further modifica-
tions and additions to encompass non-supervised, probabilistic and non-gradient
based learning. This includes genetic algorithms and reinforcement learning.

Acknowledgements Fabio Zanasi acknowledges support from epsrc EP/V002376/1.


Geoff Cruttwell acknowledges support from NSERC.
26 Cruttwell, Gavranović, Ghani, Wilson, and Zanasi

References

1. Inceptionism: Going deeper into neural networks (2015), https://ai.googleblog.


com/2015/06/inceptionism-going-deeper-into-neural.html
2. Explainable AI: the basics - policy briefing (2019), royalsociety.org/ai-
interpretability
3. Abramsky, S., Coecke, B.: A categorical semantics of quantum protocols. In: Pro-
ceedings of the 19th Annual IEEE Symposium on Logic in Computer Science, 2004.
pp. 415–425 (2004). https://doi.org/10.1109/LICS.2004.1319636
4. Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M.W., Pfau, D., Schaul, T.,
Shillingford, B., de Freitas, N.: Learning to learn by gradient descent by gradient
descent. In: 30th Conference on Neural Information Processings Systems (NIPS)
(2016)
5. Baez, J.C., Erbele, J.: Categories in Control. Theory and Applications of Categories
30(24), 836–881 (2015)
6. Bohannon, A., Foster, J.N., Pierce, B.C., Pilkiewicz, A., Schmitt, A.: Boomerang:
Resourceful lenses for string data. SIGPLAN Not. 43(1), 407–419 (Jan 2008).
https://doi.org/10.1145/1328897.1328487
7. Boisseau, G.: String Diagrams for Optics. arXiv:2002.11480 (2020)
8. Bonchi, F., Sobocinski, P., Zanasi, F.: The calculus of signal flow di-
agrams I: linear relations on streams. Inf. Comput. 252, 2–29 (2017).
https://doi.org/10.1016/j.ic.2016.03.002, https://doi.org/10.1016/j.ic.2016.
03.002
9. Capucci, M., Gavranovi’c, B., Hedges, J., Rischel, E.F.: Towards foundations of
categorical cybernetics. arXiv:2105.06332 (2021)
10. Capucci, M., Ghani, N., Ledent, J., Nordvall Forsberg, F.: Translating Extensive
Form Games to Open Games with Agency. arXiv:2105.06763 (2021)
11. Chollet, F., et al.: Keras (2015), https://github.com/fchollet/keras
12. Clarke, B., Elkins, D., Gibbons, J., Loregian, F., Milewski, B., Pillmore, E., Román,
M.: Profunctor optics, a categorical update. arXiv:2001.07488 (2020)
13. Cockett, J.R.B., Cruttwell, G.S.H., Gallagher, J., Lemay, J.S.P., MacAdam, B.,
Plotkin, G.D., Pronk, D.: Reverse derivative categories. In: Proceedings of the
28th Computer Science Logic (CSL) conference (2020)
14. Coecke, B., Kissinger, A.: Picturing Quantum Processes: A First Course in Quan-
tum Theory and Diagrammatic Reasoning. Cambridge University Press (2017).
https://doi.org/10.1017/9781316219317
15. Cortes, C., Vapnik, V.: Support-vector networks. Machine learning 20(3), 273–297
(1995)
16. Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: Training Deep Neural
Networks with binary weights during propagations. arXiv:1511.00363
17. CRCoauthors, A.: Numeric Optics: A python library for constructing and training
neural networks based on lenses and reverse derivatives. https://github.com/
anonymous-c0de/esop-2022
18. Dalrymple, D.: Dioptics: a common generalization of open games and gradient-
based learners. SYCO7 (2019), https://research.protocol.ai/publications/
dioptics-a-common-generalization-of-open-games-and-gradient-based-
learners/dalrymple2019.pdf
19. Dosovitskiy, A., Brox, T.: Inverting convolutional networks with convolutional net-
works. arXiv:1506.02753 (2015)
Discovering Diverse Content Through
Random Scribd Documents
(2) Our Sacrifice was offered in the eternal purpose of God; or,
according to the text, ‘through the eternal Spirit.’ It was not an
accidental selection, but a gift predetermined in the counsel of
Jehovah, so that He is described by St. Peter as ‘the Lamb slain from
the foundation of the world.’
(3) Our Sacrifice gave Himself as a freewill offering to God. ‘He
offered Himself.’ The calf, or the goat, was chosen by its owner, and,
when chosen, had no knowledge of anything that was before it: it
had no voice in the whole transaction, and knew nothing of what it
was to bear. But our Blessed Saviour, He foresaw the whole. He
knew the whole burden; He realised the whole sorrow; and keenly
felt its bitterness. In His human nature He shrank from the cup. It
was so oppressive to Him that He threw himself before God ‘with
strong cryings and tears, and was heard in that He feared.’ And yet,
with the whole horror of the dreadful burden fully before Him, and
with the full and entire knowledge of what it was to be forsaken of
God, He was so resolute in carrying out the great plan of the
covenant of life, that He yielded up His own will, and offered Himself
as a sin-offering to God.
(4) Our Sacrifice was ‘without spot.’
It was required that the poor kid should be without blemish; partly
to show that God does not accept blemished gifts, but chiefly
because it was typical of the coming Christ. But the fact that there
was no spot, either within or without, did not add to its real value:
the spotted kid would have fetched as much as the unspotted, in the
market. But when it says of our Sacrifice that He was ‘a lamb
without blemish and without spot,’ what a tale it tells of His sinless
holiness! His perfect sinlessness had stood the test of the whole of
pre-existent eternity. We all know how first impressions of character
become modified by time: imperfections, unseen at first, soon begin
to crop up; there are very few of whom you can say that you have
known them for twenty or thirty years, and never heard a word
escape their lips that you would be sorry to speak in your dying
hour. But there was a oneness for all eternity between the Father
and the Son; yet eternity itself could discover no flaw, so that when
the time came for the great sacrifice, He was without spot, even
before God. He that was the sin-bearer was Himself sinless; and if
you think what is involved in the statement that ‘He knew no sin,’
then you may form some idea of the great fact that sin, even our
sin, was imputed to Him; or, in other words, that ‘God made Him to
be sin for us.’
Looking then at the contrast between the sacrifices, the one, in
comparison to the other, is infinitely little. The poor calf, or kid, was
nothing,—far less than nothing in comparison of the Son of God.
There was nothing in it that could stand comparison for a moment.
If you look at the sacrifice of the Son of God, the voluntary offering
through the eternal Spirit of the spotless and Holy One, the sacrifice
of the kid vanishes. It disappears altogether; it is no more than a
grain of dust on the side of a mountain.
Yet those sprinklings under the law were effectual for their purpose.
There was no failure in them: they accomplished all for which they
were intended; every promise made respecting them was fulfilled.
The legal cleansing in all cases was complete.
Now then, I come to the point. If these sacrifices, so insignificant,
so valueless, and to the eye of man so powerless, were effective for
their purpose,—shall not that most marvellous wonder in the whole
history of the Godhead, the sacrifice of the Son of God, be effective
for His? It is true their concern was with the flesh, His with the
conscience; but is there any one prepared to say, ‘They never failed:
but He may’? Can any one of us admit for one moment, that the
man who was sprinkled with the ashes of an heifer was invariably
reinstated as a clean man in the sanctuary; and that there can be
the least shadow of the possibility of a doubt that the blood of Christ
is completely restored to fellowship with God?
Look at Him as the Son of God, as the anointed Messiah, as the
Spotless Lamb, as offering Himself in obedience to the Father’s
eternal will; and tell me, ye that cannot trust Him,—Can such a
sacrifice as that fail? Shall that which is infinitely little succeed, and
that which is infinitely great fail? Can you believe that the blood of a
poor little kid was sufficient, but the blood of the Son of God
insufficient? Think then of the ancient Jew walking home after his
sacrifice, notwithstanding all that had passed, now clean and
reinstated. Then think of the most precious blood of Christ, the Son
of God, the Lamb slain before the foundation of the world,—slain too
for those very sins that weigh on the conscience; and consider why
should not you go home this day, as that Jew did in ancient times,
Free?—in the position of those described by St. Paul, when he said,
‘Such were some of you: but ye are washed, but ye are sanctified,
but ye are justified in the name of the Lord Jesus, and by the Spirit
of our God’?

THE CLEANSING BLOOD.


‘But if we walk in the light, as He is in the light, we have
fellowship one with another, and the blood of Jesus Christ His
Son cleanseth us from all sin.’—1 John, i. 7.

It should be the earnest desire of our hearts, in commemorating the


great facts of the crucifixion of our blessed Saviour, to know the
fellowship of His sufferings, and realise all that He endured. On this
account it is well to dwell on His wounds, His sorrows, His tears, His
prayers, and His bitter cry: but it is well to look also at the power of
His precious blood, and at the great results accomplished in the
covenant of God by that wonderful blood-shedding of the Son of
Man.
I scarcely know which branch of the subject is the more important of
the two, and I propose to-day [27] to consider the latter, and to draw
your thoughts to the power of the blood, as taught us in these
familiar and most sacred words, ‘The blood of Jesus Christ His Son
cleanseth us from all sin.’ May God so teach us by His Spirit, that we
may know in our own experience the cleansing power of the blood
of Christ!
There are two questions which will require our careful study, in order
to a right understanding of the text. To whom do the words apply?
and what do they mean?
To whom do they apply?
They are often applied in a loose and haphazard way to all kinds of
characters to whom they do not in the least belong, as drunkards,
Sabbath-breakers, and all descriptions of unconverted men. But the
most cursory glance at the text shows that it has no reference to
such characters, and applies exclusively to those who are walking in
the light. ‘If we walk in the light, as He is in the light, we have
fellowship one with another, and the blood of Jesus Christ His Son
cleanseth us from all sin.’ The blessing is made in this text entirely
dependent on our walking in the light; and if we are not walking in
the light, it does not belong to us. The man who is still living the life
of the unconverted cannot claim it, for he has never known the light,
and his poor soul is still darkened by sin; nor can the backslider, who
once saw the light, and now has turned back into the darkness of
the world; for, though he was in the light once, he is not walking in it
now.
There are other passages which apply to such persons, and invite
them to reconciliation through the precious blood of Christ: but this
text does not. It applies to those who have been brought into the
light, and now are walking in it; not seeking it, nor groping after it,
but in it,—and enjoying a holy fellowship both with God’s people and
with God. There cannot be a higher standard, or, to use modern
terms, a higher life. It is a life in the very presence of God Himself,
—a life in which every step is lightened by the sunbeam of His love.
Fellowship, light, love, and joy, abound through the whole of it. The
persons possessing it are happy, loving, peaceful believers, rejoicing
in the sacred privilege of companionship with God; and yet of them
the text declares, ‘The blood of Jesus Christ His Son cleanseth us
from all sin.’
And now, what is meant by the expression ‘Cleanseth us from all
sin?’ I find that it is sometimes supposed to mean, ‘the inward
cleansing of the soul,’ or the purifying of the heart by the Holy
Ghost, as in Acts, xv. 9: ‘Purifying their hearts by faith.’ But I know
of no other passage of Sacred Scripture in which this is the meaning
of cleansing by blood; whereas there are many in which it means the
removal of all legal guilt, as in Heb. ix. 13, 14, where the blood of
the Jewish sacrifice is said to ‘sanctify to the purifying of the flesh,’
and the blood of Christ ‘to purge the conscience from dead works to
serve the living God.’ The word in that passage in the Hebrews for
‘purging,’ is the same as here rendered ‘cleanseth;’ and, if there
were nothing in the context to decide it, the general use of the
language of Scripture would be sufficient. But the context appears
to leave no doubt on the subject. The ninth verse clearly shows that
the subject of the forgiveness of sin is the subject of discussion: ‘If
we confess our sins, he is faithful and just to forgive us our sins, and
to cleanse us from all unrighteousness.’ And then in the opening of
the next chapter you find the full explanation of the text. The first
two verses are in fact little more than an expansion of it. The first
verse sets before us the highest possible standard of sinless
holiness: ‘My little children, these things write I unto you, that ye sin
not.’ After which the Apostle proceeds to show the provision which
God has made for us under the sense of sin: ‘If any man sin we
have an Advocate with the Father, Jesus Christ the righteous, and he
is the propitiation for our sin.’ This is an expansion of the short
statement of the text: ‘The blood of Jesus Christ his Son cleanseth
us from all sin.’ It teaches of Christ Jesus, the righteous Advocate
and the perfect Propitiation; it shows that the blood is the blood of
atonement, and the cleansing the blotting out of the guilt of sin
through the propitiation; it points the contrite believer, deeply
humbled for sin, to what is now passing at the right hand of God,
where the Son of God now stands as his Advocate, having
completed the sacrifice, and sprinkled the blood before the mercy-
seat. That sprinkled blood is the cleansing power, and the cleansing
is the blotting out of guilt so completely that the soul stands before
God as free from all legal pollution as if it had never been defiled. It
is perfectly true that there are other passages in which we read of
inward holiness as a purification of the heart: as e.g., 1 John, iii. 3.
But that is quite a different thing to the cleansing through blood
described in our text, the real meaning of which is the removal of all
guilt from the guilty sinner by the transfer of it, according to the
covenant, to the great sin-bearer: ‘The Lamb of God that taketh
away the sins of the world.’
Now, this being the case, it appears to me that there are two great
practical conclusions which irresistibly follow:—
First, that those who are walking in the light have sin in them which
needs the cleansing blood. I know very well, and thank God for the
blessed assurance that ‘Sin shall not have dominion over you, for ye
are not under the law, but under grace.’ I know, too, that our
blessed Saviour is the great deliverer, and that He will so surely and
so effectually save His people from their sins, that every one of
them, without a single exception, will finally be presented without a
single spot, or stain, before His throne. But that is not the
question. The question is whether in saving us it pleases Him to put
an end, while we are in the flesh, to the deep corruption of our
human nature, or to give us invariably such a victory that we shall
never have reason to repent and deplore its power. If this verse
stood alone it would decide the point, for it shows the deep need of
the cleansing blood, even for those who are walking in the light. It
describes two gifts as the sacred privilege of their life in Christ
Jesus: fellowship one with another, and cleansing through the power
of His blood. It proves, therefore, beyond the possibility of doubt,
that whatever meaning we attach to the word ‘cleanseth,’ there is sin
which requires to be cleansed, even in those who are walking in the
light. But it does not stand alone, for the 8th and the 10th verses
explain to us the reason of the need: the one teaches that we ‘have
sin,’ and the other, that we ‘have sinned;’ the one speaks of the deep
corruption of our nature, the other of the action to which this
corruption has given rise; and both teach the same thing,—viz., that
those who are walking in the light have sin in themselves and in
their conduct,—sin which requires cleansing; that if ‘they say they
have no sin, they deceive themselves, and the truth is not in them;’
and ‘if they say they have not sinned, they make Him a liar, and His
word is not in them.’
I believe then that the compilers of our Articles were walking in the
light when they wrote that, ‘The infection of nature doth remain,
yea, in them that are regenerated;’ that the compilers of our Prayer-
book were walking in the light when they taught us to confess, ‘We
have left undone those things which we ought to have done; and we
have done those things which we ought not to have done; and there
is no health in us;’ and that old Hooker was writing in the light when
he said, ‘If God should yield unto us, not as unto Abraham, if fifty,
forty, thirty, twenty, yea, or if ten good persons could be found in a
city, for their sakes this city should not be destroyed; but and if He
should make us an offer thus large; search all the generations of
men since the fall of our father Adam, find one man that hath done
one action which hath passed from him pure, without any stain or
blemish at all; and for that one man’s only action, neither man nor
angel shall feel the torments which are prepared for both. Do you
think that this ransom to deliver men and angels could be found to
be among the sons of men? The best things which we do have
somewhat in them to be pardoned.’
But there is a second lesson: viz., that while there is the deep need,
there is the ample provision; for, although no ransom could be found
among the sons of men, there is a perfect ransom in the most
precious blood of the Son of God. The propitiation is complete, and,
through the wonderful mercy which God has shown in His covenant,
that precious blood cleanseth from all sin.
There are two points in this sentence to which I would draw your
special attention.
(1.) It is all sin that is cleansed. Sin after baptism, as well as sin
before it; sin committed in the light, as well as sin in the days of
darkness; sin of omission, and sin of commission; sin of act, sin of
word, sin of thought, sin of temper, sin of desire, sin of heart, sin in
the acts of religion, and sin in daily life; sin that is not noticed as it
ought to be, and sin that leaves an inexpressible pain on the
conscience. It is all included in the one word ‘all.’ ‘The blood of
Jesus Christ his Son cleanseth us from all sin.’
(2.) It is a continuous cleansing: that is, continued day by day as
long as the walk lasts. It is well explained by those words, John, xiii.
10: ‘He that is,’ or hath been, ‘washed needeth not save to wash his
feet, but is clean every whit.’ The first great washing took place
when you were first brought to Christ, as represented in your
baptism; and now as you are walking home through a defiling world,
you require the continual cleansing of the feet. And this is what is
promised in the text. We are walking, and the blood is cleansing. It
might be rendered, ‘If we are walking in the light, the blood is
cleansing us from all sin.’ The walk is continuous, and the cleansing
continuous likewise. As we are taught in the Lord’s Prayer to ask for
daily forgiveness with the same regularity as we ask for daily bread;
so we are taught to trust the cleansing power for every step of the
daily life. Day after day, and night after night, we return to the
same fountain, and there experience the same power; and so it will
be to the end, when all this earthly walking shall cease, and the
ransomed spirit shall appear spotless before God.
But let us not speak of the blood, and its cleansing power, without
remembering well what we mean by it. We do not mean the
material blood which flowed from the feet and hands, or trickled
down his careworn face from beneath the crown of thorns; nor that
which after death gushed from His pierced side; for we can never be
sprinkled by that. Still less do we mean what some suppose to be
actual blood in the transubstantiated cup. We mean nothing
material: for nothing material can cleanse the soul. We mean the
sacrifice of the life of the Son of God as an efficacious offering for
the life of the sinner.
In the sacred history of that mysterious death, we read His bitter
cry: ‘My God, my God, why hast thou forsaken me?’ This cry it is
utterly impossible to explain on any other principle than the
imputation of sin. I am not one of those that would dare to speak of
impossibilities with God. But when I think of the eternal and
spotless holiness of the Lord Jesus, of His perfect purity and His
blameless life, I find it utterly impossible to myself to imagine on
what principle He could have been forsaken at such a moment by
one with whom he had been one for eternity, if it had not been that
sin, not His own, was imputed to Him; or in the words of Sacred
Scripture, that ‘The Lord hath laid on Him the iniquity of us all.’ And
then, when I turn to the dying cry, ‘It is finished,’ I see the
completion of the work. What He had undertaken He had borne:
what He covenanted to do was done. The covenanted ransom was
paid; the covenanted sacrifice offered; the covenanted life given;
and then, the burden being gone, He yielded up His soul into the
Father’s hands, and said, ‘Father, into Thy hands I commend my
spirit.’ This explains what we mean by the power of the cleansing
blood. We mean the full and perfect freedom given to all those who
are in Christ Jesus; because He, as their Head, has paid their
penalty. We look on Him forsaken, and believe that we shall never
be. We look on Him bearing the full penalty of the law, and we
know that because He has borne it, the awful curse will never rest
on us. We listen to Him crying, ‘It is finished,’ and we know that
nothing more can be needed in sacrifice; that the whole redemption
as planned in God’s eternal purpose is complete; and that therefore,
as He did, so may we commend into a loving Father’s hand all we
are, and all we care for, saying, ‘Father, into Thy hands I commend
my spirit, for Thou hast redeemed me, O Lord God of truth.’ We
need no priest’s absolution; and we cannot endure the thought of
any continuation, or repetition, or any thing approaching to
propitiatory sacrifice. We believe that the whole work is finished
according to the purpose of God Himself, and that ‘the blood of
Jesus Christ his Son,’ without any addition of any kind whatever,—
simply and alone,—‘cleanseth from all sin.’ Thus we agree, heart
and soul, with the grand old words of Hooker: ‘Let it be counted folly,
or frenzy, or fury, whatsoever, it is our comfort, our wisdom; we care
for no knowledge in the world but this,—that man hath sinned, and
God hath suffered; that God hath made Himself the Son of Man, and
that men are made the righteousness of God.’
PERSONAL HOLINESS.

THE SANCTIFIED.
‘Unto the church of God which is at Corinth, to them that are
sanctified in Christ Jesus, called to be saints, with all that in
every place call upon the name of Jesus Christ our Lord, both
their’s and our’s.’—1 Cor. i. 2.

I hope it has been made clear that the original meaning of the word
‘to sanctify,’ was ‘to set apart as a holy thing unto God,’ and that the
Levitical meaning of sanctification through blood was the cleansing
from all legal impurity. It is obvious that both these divine acts
clearly involve personal holiness.
That which is set apart by the call of God, and cleansed from all
guilt, should clearly be kept holy and undefiled from the pollution
both of the heart and of the world. Such persons should be like the
vessels of the sanctuary: ‘sanctified and meet for the Master’s use.’
This leads us to that which is the ordinary meaning of the term in
religious books: viz. personal holiness. By personal holiness is
meant the sacred work of God the Holy Ghost within the soul; the
reflection of the character of our Blessed Lord; the law put into the
inward part, and written on the heart by the Spirit of God. This is
the meaning of such texts as 1 Thess. v. 23: ‘The very God of peace
sanctify you wholly; and I pray God your whole spirit and soul and
body be preserved blameless unto the coming of our Lord Jesus
Christ.’ The separation is a past act, for if we are in Christ Jesus, we
have been already separated unto God; but this is an abiding
condition, for real holiness is a present matter of daily life. Now
both these parts of sanctification are brought out in the words which
I have read as our text, for it is addressed ‘to them that are,’ or have
been, ‘sanctified in Christ Jesus, called to be saints.’ The first clause
refers to the past act, and represents those believers as having been
set apart unto God, or separated as a peculiar people unto Himself;
the second describes their present condition as inseparable from
their high calling; for, having been set apart, they are now called to
be the saints of God. I need not stop to point out that the word
‘saints,’ is not limited in Sacred Scripture to those who are in
heaven. Still less has it to do with the canonization of Rome, or the
seclusion of a monastery. It describes the personal holiness of the
Christian in common life, the habitual character of the man walking
with God; as in those words of the Apostle Peter (1 Pet. i. 15): ‘Be
ye holy;’ or be ye saints, ‘in all manner of conversation.’ So that our
text means the same as if it had been written, ‘To them that have
been set apart unto God in Christ Jesus their Saviour, and who, as
the result of that sacred call, are now leading holy lives in His
presence.’
My object this morning will be simply to trace the connexion
between these two steps of God’s sacred work: the past separation
unto Him, and the present personal holiness of character. And all I
can say is, may the Lord help us in our own experience to
understand both of them, and then we shall have no difficulty in
perceiving how they are bound together in the work of Salvation!
We shall find them connected by a principle and a power.
I. A Principle. It is perfectly clear, as a matter of principle, that
there ought to be holiness in all that is consecrated to God. He
consecrated, or set apart unto Himself, the Sabbath day, and so He
says, ‘Remember the Sabbath day, to keep it holy.’ The Temple, like
our own churches, was consecrated to God, and therefore we read,
‘Holiness becometh Thine house for ever.’ The vessels of the Temple
were dedicated, or sanctified to His service, and therefore they
should not be touched by unhallowed hands, and the words of
sacred Scripture are, ‘Be ye clean, that bear the vessels of the Lord.’
On these principles we none like to see a neglected church, a
dishonoured Bible, or a careless attitude, in the house of God. On
the same principles, we should all be profoundly humbled when
unhallowed thoughts,—thoughts of the world, thoughts of vanity, of
jealousy, or of self,—in any shape intrude into holy things, and
corrupt those sacred hours which are set apart exclusively to God.
But if this applies to consecration generally, how pre-eminently does
it apply to such a consecration as that described in the text, in which
we are said to be ‘sanctified in Christ Jesus.’ That sanctification is
the introduction of the ruined sinner into a covenant union with the
Son of God. If you have been thus sanctified in Christ Jesus, you no
longer stand alone, to bear your own burden, or plead your own
cause. You have been separated from the ruined world, and
identified with the Lord Jesus; so that He represented you in bearing
your curse when He suffered, and He now represents you at the
right hand of the throne of God, while He pleads on your behalf.
Thus you are cleansed from all legal guilt. You are sanctified by
blood, and charged with no defilement. By the eternal covenant of
God He is become your Head, so that in His death you died; in His
life you live; in His acceptance you are justified; and in His glory you
are glorified.
But, if you are thus separated unto such a union with Him, is there
any room left for one moment’s doubt as to what ought to be your
character? If you are set apart by Him into this covenant union with
Himself, you are set apart into a oneness of mind, of will, and of
interest. He represents you in heaven, and you represent Him on
earth; as God sees you in Him, so the world sees Him in you. You
bear His name; you are sealed with the seal of the covenant; you
are made a peculiar people unto the Son of God: and I am sure we
must all see the justice of those words of the Apostle Peter, ‘As He
who hath called you is holy, so be ye holy in all manner of
conversation.’
II. Power. But here lies the difficulty. You really wish to be holy, but
you are not; you have endeavoured to overcome your temper, but it
is still there; you have striven against wandering thoughts in prayer,
but they still interfere most mournfully with your most sacred acts of
worship; you wish, and you mean to be, unselfish, but you find
selfishness continually cropping up, to your sorrow and vexation of
spirit. Now, the question is,—How is this to be overcome? Your
resolves will not do it, for you have made hundreds, and failed in
them all; and no man on earth can do it for you, for the evil lies far
too deep for the reach of man. What then is to be done? We may
turn back to our text, and there learn the secret, for in those words,
‘In Christ Jesus,’ we are taught the power.
We learned, in studying the first act of separation, that it was the
work of God the Holy Ghost: here it is said to be in Christ Jesus.
Some people dwell more on the distinction than I am myself
disposed to do: there is such a perfect oneness in the infinite God,
that I confess I have but little heart for these refined distinctions. As
the Father and the Son are one, so God the Son and God the Holy
Ghost are one; and when the Lord Jesus saves the sinner, it is the
Holy Ghost that applies that salvation to the soul. Without stopping,
therefore, to study any such distinction, let us rather hasten to the
practical lesson that sanctification, or consecration, here described,
is a sanctification in Jesus Christ. You may look thus to your
covenant union with Him, and trust Him by the in-breathing of His
own Spirit to make you holy. You may remember that He came to
save you from your sins, and not merely from their curse; and that
holiness is just as much a gift of the covenant as pardon.
You remember those words, 1 Cor. i. 30: ‘But of Him are ye in Christ
Jesus, who of God is made unto us wisdom, and righteousness, and
sanctification, and redemption.’ They teach us that the Lord Jesus is
the source of all practical wisdom and holiness, just as much as of
imputed righteousness and redemption. The passage is clearly not
speaking of an imputed wisdom, and we have no right to apply it to
an imputed sanctification. It refers to the practical wisdom and
personal holiness of the man who by God’s grace is wise and holy;
and teaches that both one and the other are found exclusively in
Christ Jesus.
You may, therefore, trust the Lord Jesus Christ for your sanctification
just as much as for your justification; for your personal holiness in
daily life, as well as for your safety in the great day of judgment.
Look carefully then unto your covenant union with Christ, and think
on Him as your covenant Head: then spread out all your difficulties
and temptations before Him as your Head. Acknowledge before Him
how you have dishonoured His headship by your evil thoughts, your
evil words, and your constant failures; and trust Him, as your Head,
to form in you His image, and by His own most Holy Spirit to give
you the victory. Do not stand at a distance, thinking it your duty to
doubt your union, for by so doing you will never overcome. Without
that union you will never know what victory means. I am assuming
now that in the secret of your own souls you have been verily
engrafted into Christ. I know you were sacramentally in baptism,
but I am looking deeper than that, for many who are baptized are
never saved, and I am speaking now of the real saving union of your
soul with Christ Himself. Now, if that is yours through His wonderful
grace, accept it, act upon it, trust Him as your living Head; and you
will find, as time goes on, that though you cannot overcome, He
can; and that He will finally present you, ‘holy and unblameable
before Him at His coming.’ But here we are brought to the old
difficulty,—that you have no real evidence in your soul of the
existence of such a union with Christ. You know there is your
baptismal union, but still you cannot feel safe, and you greatly doubt
whether you are amongst those who have been ‘Sanctified in Christ
Jesus: called to be saints.’ This is the reason why many of you
cannot come to the sacred feast of the Supper of the Lord, and why
many others, who do come, come with a heavy heart. Would to God
we might see those absent ones brought near, and those heavy
hearts gladdened by the Lord! But in order to that you must grapple
at once with the great question of your own personal salvation,—
your separation unto God. Till that is settled you will be powerless
against yourself. Till you are in Christ, and sanctified in Christ Jesus,
you will never be sanctified at all. If you really desire to be really
holy, for the sake of that holiness begin at the beginning, and never
rest till you are safe. Your safety must come before your holiness, or
you will wait for it for ever. Begin therefore with the prayer, ‘Lord,
save me: I perish.’ Throw yourself into His hand for pardon, for
acceptance, for life. Never rest till you can appropriate the language
of St. Paul: ‘Who hath saved me, and called me with an holy calling.’
And, when that is given, you may go on to those other words of the
same Apostle, and say, ‘According as He hath chosen us in Him
before the foundation of the world, that we should be holy and
without blame before Him in love.’

PROGRESS.
‘But we all, with open face beholding as in a glass the glory of
the Lord, are changed into the same image from glory to glory,
even as by the Spirit of the Lord.’—2 Cor. iii. 18.

The subject of personal holiness is one of overwhelming interest to


all those who really desire to walk with God; it is also one which
requires our most careful study, for in it lies the chief difficulty of the
daily Christian life of the greater number of true believers. They
know the truth, and love it; they prize their Saviour far above all that
the world can give; they are ready either to live or die in His service;
but yet they are so conscious of the power of indwelling sin that
sometimes they are led almost to doubt the reality of their
Christianity, and begin to question whether they really love their Lord
at all. For the help of such persons it seems clearly a duty to look
carefully into the subject, and I pray God that He may help us to do
so this morning.
This text will supply us with so much instruction that we will not
attempt to examine other passages, but will endeavour to gather
from it the standard, the progress, the means, and the power of
Christian holiness. May God so bless His word, as to make us holy
by His Spirit!

I. The Standard. How many a noble ship has been lost through
some inaccuracy in the compass! If the compass points too much to
the east or to the west, the most careful commander may wreck his
vessel. And if the compass of the soul is in a wrong direction you
will find it very hard to walk in a right path. Now, in this text the
one standard is the image of the Lord Jesus. We are said to behold
the glory of the Lord, and to be changed into His image. There
cannot be a doubt that this refers to the Lord Jesus Christ, and that
by His glory is meant His grace. If there were, it would be settled by
these words in John, i. 14: ‘The Word was made flesh, and dwelt
among us; and we beheld His glory, the glory as of the only-
begotten of the Father, full of grace and truth.’ The great
manifestation of the glory of God is in the grace and truth of the
incarnate Word. If, then, we would be holy, as God is holy, we must
be changed into the image of the Lord Jesus Christ. When we are
like Him we shall be holy, harmless, undefiled, and separate from
sinners; but not before. So you will find that, when persons speak
about their sinlessness, it may generally be traced to their adopting
a low standard of holiness. Sometimes people will set up their own
experience as a standard, and really seem to think that we are to
receive their accounts of their own experience as if it were another
Bible. Sometimes we read of a perfection, not absolute, but ‘up to
the measure of to-day’s consciousness.’ Accordingly I have read of
one described as an eminent Christian, who ‘said that a few days
more would make twenty-one years that his obedience had been
kept at the extreme verge of his light.’ I am not sure that I know
what the writer means, and I may possibly misunderstand his words;
but, if they mean what they seem to mean, I can scarcely imagine
anything more delusive. We know very well how the eyes may be
blinded, the heart deadened, and the conscience seared by sin; we
know that the deeper a man is sunk in sin the less he feels it, and
the lower his fall the more profound his want of feeling; and only
imagine what must be the result if a deadened, thickened, darkened
conscience were to be accepted as the measure of a sinless life.
The idea reminds one of these words of St. Paul, 2 Cor. x. 12: ‘They
measuring themselves by themselves, and comparing themselves
among themselves, are not wise.’ Nay: they go further, and point to
the tremendous danger pointed out by our Lord Himself: ‘If the light
that is in thee be darkness, how great is that darkness.’ (Matt. vi.
23.) No: we must have a standard rising high above either
consciousness, conscience, or our own light; a standard that never
varies; a standard that does not go up and down with our changes
of feeling or opinion; a standard as unchanging as the perfect
character of God Himself! This is the standard which we find in the
Lord Jesus Christ,—in God manifest in the flesh: and what is more,
thanks be to God, this is the standard which we shall one day
reach! For, though there are many things still hidden, there is one
thing we most assuredly know, and that is, that ‘when He shall
appear we shall be like Him, for we shall see Him as He is.’
II. The Progress. As I have just said, when we see the Lord as He is
we shall be like Him; and when that comes to pass we shall see the
perfection of the promise, He ‘shall save His people from their sins.’
He will so completely save them that whereas He finds them corrupt,
ruined, and enemies to God by wicked works, He will finally present
them holy and unblameable, without spot and without blemish
before the throne. It is impossible to imagine anything more
blessed, more wonderful, more divine, than such a change. Now it
becomes a question of the deepest interest whether this mighty
change is accomplished by one instantaneous act, or gradually. Is it,
like justification, a completed thing? or is it a progressive work,
commenced at the new birth, but not complete till we see Him as He
is? There cannot be a more important practical inquiry. And now
you may see the importance of the distinction drawn between the
different senses of the word ‘sanctification;’ or, as it might be better
expressed, the different parts of that blessed work. If you speak of
sanctification as the original act of God in separating us unto
Himself, then it is a completed thing, for we are described as ‘having
been sanctified in Christ Jesus.’ If, again, you speak of it as a legal
cleansing from all past guilt, it is complete, for being washed in the
precious blood we are already clean. But if you regard it as the
personal holiness of daily life, the purifying the heart through faith
by the indwelling power of the Holy Ghost, then I am prepared to
maintain from the whole testimony of the whole Word of God from
one end to the other, that so long as we are in this world of conflict
the sacred work is not complete, but progressive. How people can
speak of sanctification in this sense as an immediate work, I am at a
loss to understand. Hundreds of passages might be quoted to prove
its progressive character, and to show the reason of its present
incompleteness: viz., the abiding power of indwelling sin. I have
only time to refer to two. In the first place, this verse describes us
as changed, or being changed, from glory to glory. We are
described as in the process of transformation, or metamorphosis; by
His grace passing from glory to glory, or from one degree of grace to
another. The work is in progress, thanks be to God! and we have
the bright hope of the completed likeness of the Lord. But that
bright hope is not yet realized, nor will it be till we see Him as He is.
I will take only one other passage, and select it because it
corresponds very closely to the text. It is a passage addressed to
the believers in Rome,—to persons who are described as being
‘beloved of God called to be saints.’ (Rom. i. 7.) There can be no
doubt then that the work of personal holiness was begun in them:
yet what does St. Paul say to them? (Rom. xii. 2.) ‘Be not
conformed to this world: but be ye transformed (or
metamorphosized) by the renewing of your mind, that ye may prove
what is that good, and acceptable, and perfect will of God.’ Is it not
clear then that those persons who were beloved of God, and called
to be saints, were still to be reaching forth after higher attainments?
There was so much evil in them that they still required to be warned
against conformity to the world, and so far were they from their high
standard, that they required nothing less than a transformation or
metamorphosis (it is the same word as in the text), in order to bring
them into a personal experience ‘of the good, and acceptable, and
perfect will of God.’ Be sure then there is no resting-place in
Christian holiness for the saints of God. The Lord may have done
great things for us, whereof we are glad. He may have given us
such an insight into His grace that we now love that which we once
cared nothing for, and hate that which we once loved: He may have
led us to say from the bottom of our hearts, ‘I delight to do Thy will,
O my God.’ But our motto must still be, ‘Forgetting those things that
are behind, and reaching forth unto those things that are before, I
press towards the mark for the prize of the high calling of God in
Christ Jesus.’ The more we love Christ, the more must we be deeply
humbled that we love Him so little; and the more we look at the
blessed prospect of a real and perfect sinlessness, the more must we
be ready to say, as St. Paul did, ‘Not as though I had already
attained, either were already perfect: but I follow after, if that I may
apprehend that for which also I am apprehended of Christ Jesus.’
III. But some of you will be ready to say that that is just where your
difficulty lies. You do really desire to be going forward, and to be
making progress, but it seems as if you could not. You are like a
person in a nightmare, who wishes to run, but cannot. Let us then
consider what is God’s great instrument, whereby He imparts
progress to the soul. On this subject this text is quite decisive, for it
shows that God’s great instrument is the view of the Lord Jesus
Christ through faith. In the passage to which I have already referred
in 1 John, iii. 2, we find that the perfect view of the Lord Jesus will
lead to perfect likeness, so in these words the partial view leads to
progressive likeness. When the view is perfect the likeness will be
perfect too; now that the view is imperfect, only as through a glass,
the likeness is imperfect likewise. But still it is growing more and
more; for ‘we all,’—not merely special Christians, who have attained
what they call ‘the higher life,’—‘beholding as in a glass the glory of
the Lord, are changed into the same image from glory to glory.’
Now I believe it is impossible to press this too strongly on all those
who desire holiness, for there is a perpetual tendency in every one
of us to turn the eye inward on ourselves, instead of keeping it fixed
on Him. Some are occupied with what they feel, or do not feel, or
wish to feel, or wish they did not feel; and some by what they do, or
mean to do, or think they ought to do,—till the whole mind becomes
bewildered, and the whole soul entangled. Remember that you may
be entangled by your religious efforts, as well as by your sins:
nothing indeed entangles people more than confused and mistaken
religion. So that if you really want to be like Him, you must sweep
away all your entanglements like so many cobwebs, and, just as you
are, look straight at Him. For example, you say you do not feel sin,
and you do not feel anything like the sorrow for it that you know you
ought to do. I have no doubt you are perfectly right, and it is very
sorrowful, very sinful, and very sad. But how is it to be overcome?
I know of only one way, and that is a very simple one, too simple for
many of you,—and that is a look: you must behold Christ.
Remember the case of the Jews. Nothing yet has melted their
hearts: their great national afflictions have utterly failed: but in God’s
time there will be a change. We shall see those people mourning,—
so mourning that they will be led with broken hearts to the Fountain
open for sin and for uncleanness. And what will be God’s instrument
for producing such a change? By what means will He effect it? By a
look: a simple look! You find it described in Zech. xii. 10: ‘They shall
look on Me whom they have pierced, and shall mourn for him.’ That
one look will accomplish more than 1800 years of bitter, and most
afflictive, discipline. And it is just the same with ourselves. One
look at our loving and living Saviour will do more towards softening
the heart than hours spent in the scrutiny of feeling, or whole books
of self-examination. If you want to grow in grace, in tenderness of
conscience, in holy abhorrence of sin, in purity of heart, in lowliness
of spirit, and in thankful love for your blessed Saviour,—then look on
Him, keep your eyes on Him. Think on His Cross, how He died for
you; on His life, how He lived for you; on His advocacy, how He
pleads for you; on His perfect character, His love, His holiness, His
purity, His power, His grace, His truth: for by such a look, and such a
look alone, can you ever hope to be changed into His image.
But remember one expression in the text: viz. those three words,
‘With open face.’ The look that transforms is a look with an open
face: there must be nothing between. There must not be a veil over
it, as there is over the Jews, as you read in verse 15. Every barrier
must be removed. The great barrier of the curse is gone, through
the blood of atonement; and we must not now set up fresh barriers
of our own creation. We must remember the hymn:—

‘Just as I am: Thy love alone


Has broken every barrier down;’

and just as we are,—humbled, unworthy, cold, and unfeeling, but


yet admitted into the very presence of the Lord,—we must look with
open face at the Son of God Himself; and I hope we may be able to
say, as David did, ‘They looked to Him, and were lightened, and their
faces were not ashamed.’
IV. But the look is not enough; for besides all means there must be
a power. We cannot will ourselves into the likeness of our Saviour,
any more than an animal can will itself into a man: the great
transformation must be by the power of a life-creating agent. So,
when I turn again to this text, I find that it is not the look alone
which effects the change, but that it is God the Holy Ghost making
use of the look, and by divine omnipotence transforming the soul
into the image of the Lord Jesus. In the last words of the text it is
all ascribed to Him, as in the words, ‘Even as by the Spirit of the
Lord,’ or, as the margin has it, ‘Even as by the Lord the Spirit.’ I
have no time to dwell on this subject as I should like to do; but I
cannot conclude without begging the careful attention of all those
who are almost tempted to think that there is something in their
constitution which blocks the way against their progress. They see
others going forward while they stand still; they hear of others filled
with peace in believing, while they are still hampered by doubt, so
that they begin to think they shall never rise, and are never intended
to rise, above their present condition. I quite admit that it may be
perfectly true, that there is something peculiar in your case;
something peculiar, it may be, in your habits, in your disposition, in
your temper, in your temptation. Let us admit it, for most people
have their peculiar difficulties. But, admitting it, I pray you to
consider whether there is not a Divine power in the Lord the Spirit,
which is quite sufficient to overcome that peculiarity. Is any thing
too hard for the Lord? Think then on God the Holy Ghost: think of
Him as working out the great salvation purchased for us through the
precious blood of the Son of God, and then reflect whether it is
possible that your difficulties are beyond His power. They are
beyond your own: you have learned that. But are they beyond His?
If you have failed, is that any reason why He should? Trust Him
then, and behold Christ Jesus. Remember you are dealing with One
mighty to save, and that, while you are looking to Jesus, He,
working within, may heal your soul. Trust Him then to do it. You
know He can; trust Him that He will: and never again admit the
thought, that, notwithstanding all you have discovered in your soul,
there is any thing there too hard for the Son of God,—anything
which may not be overcome by the indwelling power of God the Holy
Ghost.

INFECTION OF NATURE.
‘I thank God through Jesus Christ our Lord. So then with the
mind I myself serve the law of God; but with the flesh the law
of sin.’—Rom. vii. 25. [64]

There are few passages in the whole word of God that have excited a
deeper interest amongst truly Christian people than the latter part of
the seventh of Romans. It is so closely connected with the practical
experience of Christian life, and at the same time it is so much
opposed to the beautiful theories of some Christian people, that it
has always excited an earnest spirit of inquiry, and engaged the
deepest interests of the students of Scripture. I propose to make it
the subject of our study this morning: to endeavour to find out what
the Scripture really teaches. And in the outset of our study I should
wish to give one caution, which I believe to be of the utmost
importance for us all: viz., we must not bring the sacred Scriptures
to the test of our theories, but must be prepared, if need be, to give
up those theories to the authority of Scripture. If we want to live in
God’s truth, we must be subject to God’s Word, and must be content
to receive what He teaches as He teaches it. In other words, we
must not twist Scripture so as to make it fit our own opinions, but
must receive it as from God, and make all our opinions bend before
its high authority.
With this caution before us, there are three subjects to be
considered. First, of whom is the Apostle speaking: of himself, or
some other man? Secondly, of what period in his Christian life is he
speaking: does he refer to the past or to the present? And thirdly,
what does the passage teach respecting his spiritual condition at the
time he wrote the words?
And now, may God the Holy Ghost, who inspired the Word, lead us
all reverently to study, and rightly to understand, His teaching!
I. To whom does it refer? I feel persuaded that we shall all admit
that, if any person were to read the chapter without having some
previous opinion to support, he would believe that the Apostle was
speaking of himself. The word ‘I’ occurs no less than twenty-eight
times in the passage. Such expressions as ‘I do,’ ‘I consent,’ ‘I allow,’
‘I delight,’ are found continually; and certainly the natural conclusion
would be that when he said, ‘I,’ he meant himself. I know that it is
sometimes said that he personated some other person; a legalist, or
one in a lower Christian life. But there is not the least evidence of
any such personation in the passage, and he says not one word to
lead us to suppose that such was his intention. In iii. 5, he does
thus personate an objector; and says, ‘But if our unrighteousness
commend,’ &c. But then he distinctly states that he is doing so in
the words, ‘I speak as a man.’ But there is nothing of the kind here.
There is a plain, simple statement in his own name; passing from
the ‘I,’ which pervades the chapter, to ‘I, myself,’ in the last verse;
and I am utterly at a loss to understand on what principle these
plain words, ‘I myself,’ can be supposed to express the personation
of some other man.
II. In answer then to our first question, I am brought to the
conclusion that when he spoke of ‘I myself,’ he meant himself, and
not another: and we may pass to our second question. To what
period of his spiritual history did he refer: did he speak of the past,
or of the present? Was he describing some period of past anxiety
out of which he had been delivered, so as to enter on the joys of the
eighth chapter? Or was he speaking of his state of mind at the very
time, that he was writing the eighth chapter, and declaring, ‘Ye have
not received the spirit of bondage again to fear; but ye have
received the spirit of adoption, whereby we cry, Abba, Father?’ In
answer to this I have not the smallest hesitation in saying that,
according to every principle of sound exposition, the seventh chapter
refers to exactly the same period as the eighth; that it is a
description of his own experience at the time he wrote the words;
and that we should have just as much authority for saying that the
eighth chapter referred only to the future, as that the seventh
referred only to the past. For this I give three reasons:—
(1.) If we wish to understand the Word of God we must receive
plain words as we find them in sacred Scripture. We have no right
to assume that the present tense stands for the past; that ‘I am,’
means ‘I was;’ that ‘I do,’ means ‘I used to do;’ that ‘I hate,’ means
‘I used to hate;’ and ‘I delight,’ means ‘I used to delight.’ If we once
begin thus to handle Scripture there is an end to exposition; and if
people who thus twist Scripture would be consistent, they ought to
go on, and say that the beginning of this verse, ‘I thank God through
Jesus Christ our Lord,’ means, ‘I used to thank Him, but I do not
now.’
(2.) Again, the transition from the past to the present is clearly
marked in the passage. In the parenthesis which extends from the
seventh verse to the end of the chapter, we find the three tenses,—
past, present, and future. From verse seven to verse thirteen it is all
in the past, and is a description of a certain portion of his past life.
‘I was alive;’ ‘the commandment came;’ ‘sin revived;’ ‘I died;’ ‘I
found it to be unto death;’ and ‘sin deceived me, and by it slew me.’

You might also like