100% found this document useful (3 votes)

66 views

Instant download The Art of Reinforcement Learning: Fundamentals, Mathematics, and Implementations with Python 1st Edition Michael Hu pdf all chapter

Michael

Uploaded by

enggelbazini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (3 votes)

66 views

Instant download The Art of Reinforcement Learning: Fundamentals, Mathematics, and Implementations with Python 1st Edition Michael Hu pdf all chapter

Michael

Uploaded by

enggelbazini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

Full download ebooks at ebookmeta.

com

The Art of Reinforcement Learning: Fundamentals,

Mathematics, and Implementations with Python 1st
Edition Michael Hu

https://ebookmeta.com/product/the-art-of-reinforcement-
learning-fundamentals-mathematics-and-implementations-with-
python-1st-edition-michael-hu-2/

OR CLICK BUTTON

DOWLOAD NOW

Download more ebook from https://ebookmeta.com

More products digital (pdf, epub, mobi) instant
download maybe you interests ...

The Art of Reinforcement Learning: Fundamentals,

Mathematics, and Implementations with Python 1st
Edition Michael Hu

https://ebookmeta.com/product/the-art-of-reinforcement-learning-
fundamentals-mathematics-and-implementations-with-python-1st-
edition-michael-hu/

Python AI Programming: Navigating fundamentals of ML,

deep learning, NLP, and reinforcement learning in
practice Patrick J

https://ebookmeta.com/product/python-ai-programming-navigating-
fundamentals-of-ml-deep-learning-nlp-and-reinforcement-learning-
in-practice-patrick-j/

Deep Reinforcement Learning with Python: With PyTorch,

TensorFlow and OpenAI Gym 1st Edition Nimish Sanghi

https://ebookmeta.com/product/deep-reinforcement-learning-with-
python-with-pytorch-tensorflow-and-openai-gym-1st-edition-nimish-
sanghi-3/

Deep Reinforcement Learning with Python With PyTorch

TensorFlow and OpenAI Gym 1st Edition Nimish Sanghi

https://ebookmeta.com/product/deep-reinforcement-learning-with-
python-with-pytorch-tensorflow-and-openai-gym-1st-edition-nimish-
sanghi/
Deep Reinforcement Learning with Python With PyTorch
TensorFlow and OpenAI Gym 1st Edition Nimish Sanghi

https://ebookmeta.com/product/deep-reinforcement-learning-with-
python-with-pytorch-tensorflow-and-openai-gym-1st-edition-nimish-
sanghi-2/

Primary Mathematics 3A Hoerst

https://ebookmeta.com/product/primary-mathematics-3a-hoerst/

Cambridge IGCSE and O Level History Workbook 2C - Depth

Study: the United States, 1919-41 2nd Edition Benjamin
Harrison

https://ebookmeta.com/product/cambridge-igcse-and-o-level-
history-workbook-2c-depth-study-the-united-states-1919-41-2nd-
edition-benjamin-harrison/

Mastering Python Forensics master the art of digital

forensics and analysis with Python 1st Edition Michael
Spreitzenbarth Johann Uhrmann

https://ebookmeta.com/product/mastering-python-forensics-master-
the-art-of-digital-forensics-and-analysis-with-python-1st-
edition-michael-spreitzenbarth-johann-uhrmann/

Reinforcement Learning: Optimal Feedback Control with

Industrial Applications Jinna Li

https://ebookmeta.com/product/reinforcement-learning-optimal-
feedback-control-with-industrial-applications-jinna-li/
Michael Hu

The Art of Reinforcement Learning

Fundamentals, Mathematics, and Implementations
with Python
Michael Hu
Shanghai, Shanghai, China

ISBN 978-1-4842-9605-9 e-ISBN 978-1-4842-9606-6

https://doi.org/10.1007/978-1-4842-9606-6

Apress Standard

The use of general descriptive names, registered names, trademarks,

service marks, etc. in this publication does not imply, even in the
absence of a specific statement, that such names are exempt from the
relevant protective laws and regulations and therefore free for general
use.

The publisher, the authors, and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Apress imprint is published by the registered company APress

Media, LLC, part of Springer Nature.
The registered company address is: 1 New York Plaza, New York, NY
10004, U.S.A.
To my beloved family,
This book is dedicated to each of you, who have been a constant source of
love and support throughout my writing journey.
To my hardworking parents, whose tireless efforts in raising us have been
truly remarkable. Thank you for nurturing my dreams and instilling in
me a love for knowledge. Your unwavering dedication has played a
pivotal role in my accomplishments.
To my sisters and their children, your presence and love have brought
immense joy and inspiration to my life. I am grateful for the laughter and
shared moments that have sparked my creativity.
And to my loving wife, your consistent support and understanding have
been my guiding light. Thank you for standing by me through the highs
and lows, and for being my biggest cheerleader.
—Michael Hu
Preface
Reinforcement learning (RL) is a highly promising yet challenging
subfield of artificial intelligence (AI) that plays a crucial role in shaping
the future of intelligent systems. From robotics and autonomous agents
to recommendation systems and strategic decision-making, RL enables
machines to learn and adapt through interactions with their
environment. Its remarkable success stories include RL agents
achieving human-level performance in video games and even
surpassing world champions in strategic board games like Go. These
achievements highlight the immense potential of RL in solving complex
problems and pushing the boundaries of AI.
What sets RL apart from other AI subfields is its fundamental
approach: agents learn by interacting with the environment, mirroring
how humans acquire knowledge. However, RL poses challenges that
distinguish it from other AI disciplines. Unlike methods that rely on
precollected training data, RL agents generate their own training
samples. These agents are not explicitly instructed on how to achieve a
goal; instead, they receive state representations of the environment and
a reward signal, forcing them to explore and discover optimal strategies
on their own. Moreover, RL involves complex mathematics that
underpin the formulation and solution of RL problems.
While numerous books on RL exist, they typically fall into two
categories. The first category emphasizes the fundamentals and
mathematics of RL, serving as reference material for researchers and
university students. However, these books often lack implementation
details. The second category focuses on practical hands-on coding of RL
algorithms, neglecting the underlying theory and mathematics. This
apparent gap between theory and implementation prompted us to
create this book, aiming to strike a balance by equally emphasizing
fundamentals, mathematics, and the implementation of successful RL
algorithms.
This book is designed to be accessible and informative for a diverse
audience. It is targeted toward researchers, university students, and
practitioners seeking a comprehensive understanding of RL. By
following a structured approach, the book equips readers with the
necessary knowledge and tools to apply RL techniques effectively in
various domains.
The book is divided into four parts, each building upon the previous
one. Part I focuses on the fundamentals and mathematics of RL, which
form the foundation for almost all discussed algorithms. We begin by
solving simple RL problems using tabular methods. Chapter 2, the
cornerstone of this part, explores Markov decision processes (MDPs)
and the associated value functions, which are recurring concepts
throughout the book. Chapters 3 to 5 delve deeper into these
fundamental concepts by discussing how to use dynamic programming
(DP), Monte Carlo methods, and temporal difference (TD) learning
methods to solve small MDPs.
Part II tackles the challenge of solving large-scale RL problems that
render tabular methods infeasible due to their complexity (e.g., large or
infinite state spaces). Here, we shift our focus to value function
approximation, with particular emphasis on leveraging (deep) neural
networks. Chapter 6 provides a brief introduction to linear value
function approximation, while Chap. 7 delves into the renowned Deep
Q-Network (DQN) algorithm. In Chap. 8, we discuss enhancements to
the DQN algorithm.
Part III explores policy-based methods as an alternative approach to
solving RL problems. While Parts I and II primarily focus on value-
based methods (learning the value function), Part III concentrates on
learning the policy directly. We delve into the theory behind policy
gradient methods and the REINFORCE algorithm in Chap. 9.
Additionally, we explore Actor-Critic algorithms, which combine policy-
based and value-based approaches, in Chap. 10. Furthermore, Chap. 11
covers advanced policy-based algorithms, including surrogate objective
functions and the renowned Proximal Policy Optimization (PPO)
algorithm.
The final part of the book addresses advanced RL topics. Chapter 12
discusses how distributed RL can enhance agent performance, while
Chap. 13 explores the challenges of hard-to-explore RL problems and
presents curiosity-driven exploration as a potential solution. In the
concluding chapter, Chap. 14, we delve into model-based RL by
providing a comprehensive examination of the famous AlphaZero
algorithm.
Unlike a typical hands-on coding handbook, this book does not
primarily focus on coding exercises. Instead, we dedicate our resources
and time to explaining the fundamentals and core ideas behind each
algorithm. Nevertheless, we provide complete source code for all
examples and algorithms discussed in the book. Our code
implementations are done from scratch, without relying on third-party
RL libraries, except for essential tools like Python, OpenAI Gym, Numpy,
and the PyTorch deep learning framework. While third-party RL
libraries expedite the implementation process in real-world scenarios,
we believe coding each algorithm independently is the best approach
for learning RL fundamentals and mastering the various RL algorithms.
Throughout the book, we employ mathematical notations and
equations, which some readers may perceive as heavy. However, we
prioritize intuition over rigorous proofs, making the material accessible
to a broader audience. A foundational understanding of calculus at a
basic college level, minimal familiarity with linear algebra, and
elementary knowledge of probability and statistics are sufficient to
embark on this journey. We strive to ensure that interested readers
from diverse backgrounds can benefit from the book’s content.
We assume that readers have programming experience in Python
since all the source code is written in this language. While we briefly
cover the basics of deep learning in Chap. 7, including neural networks
and their workings, we recommend some prior familiarity with
machine learning, specifically deep learning concepts such as training a
deep neural network. However, beyond the introductory coverage,
readers can explore additional resources and materials to expand their
knowledge of deep learning.
This book draws inspiration from Reinforcement Learning: An
Introduction by Richard S. Sutton and Andrew G. Barto, a renowned RL
publication. Additionally, it is influenced by prestigious university RL
courses, particularly the mathematical style and notation derived from
Professor Emma Brunskill’s RL course at Stanford University. Although
our approach may differ slightly from Sutton and Barto’s work, we
strive to provide simpler explanations. Additionally, we have derived
some examples from Professor David Silver’s RL course at University
College London, which offers a comprehensive resource for
understanding the fundamentals presented in Part I. We would like to
express our gratitude to Professor Dimitri P. Bertsekas for his
invaluable guidance and inspiration in the field of optimal control and
reinforcement learning. Furthermore, the content of this book
incorporates valuable insights from research papers published by
various organizations and individual researchers.
In conclusion, this book aims to bridge the gap between the
fundamental concepts, mathematics, and practical implementation of
RL algorithms. By striking a balance between theory and
implementation, we provide readers with a comprehensive
understanding of RL, empowering them to apply these techniques in
various domains. We present the necessary mathematics and offer
complete source code for implementation to help readers gain a deep
understanding of RL principles. We hope this book serves as a valuable
resource for readers seeking to explore the fundamentals, mathematics,
and practical aspects of RL algorithms. We must acknowledge that
despite careful editing from our editors and multiple round of reviews,
we cannot guarantee the book’s content is error free. Your feedback and
corrections are invaluable to us. Please do not hesitate to contact us
with any concerns or suggestions for improvement.

Source Code
You can download the source code used in this book from github.com/
apress/art-of-reinforcement-lear ning.
Michael Hu
Any source code or other supplementary material referenced by the
author in this book is available to readers on GitHub (https://github.
com/Apress). For more detailed information, please visit https://www.
apress.com/gp/services/source-code.
Contents
Part I Foundation
1 Introduction
1.1 AI Breakthrough in Games
1.2 What Is Reinforcement Learning
1.3 Agent-Environment in Reinforcement Learning
1.4 Examples of Reinforcement Learning
1.5 Common Terms in Reinforcement Learning
1.6 Why Study Reinforcement Learning
1.7 The Challenges in Reinforcement Learning
1.8 Summary
References
2 Markov Decision Processes
2.1 Overview of MDP
2.2 Model Reinforcement Learning Problem Using MDP
2.3 Markov Process or Markov Chain
2.4 Markov Reward Process
2.5 Markov Decision Process
2.6 Alternative Bellman Equations for Value Functions
2.7 Optimal Policy and Optimal Value Functions
2.8 Summary
References
3 Dynamic Programming
3.1 Use DP to Solve MRP Problem
3.2 Policy Evaluation
3.3 Policy Improvement
3.4 Policy Iteration
3.5 General Policy Iteration
3.6 Value Iteration
3.7 Summary
References
4 Monte Carlo Methods
4.1 Monte Carlo Policy Evaluation
4.2 Incremental Update
4.3 Exploration vs.Exploitation
4.4 Monte Carlo Control (Policy Improvement)
4.5 Summary
References
5 Temporal Difference Learning
5.1 Temporal Difference Learning
5.2 Temporal Difference Policy Evaluation
5.3 Simplified 𝜖-Greedy Policy for Exploration
5.4 TD Control—SARSA
5.5 On-Policy vs.Off-Policy
5.6 Q-Learning
5.7 Double Q-Learning
5.8 N-Step Bootstrapping
5.9 Summary
References
Part II Value Function Approximation
6 Linear Value Function Approximation
6.1 The Challenge of Large-Scale MDPs
6.2 Value Function Approximation
6.3 Stochastic Gradient Descent
6.4 Linear Value Function Approximation
6.5 Summary
References
7 Nonlinear Value Function Approximation
7.1 Neural Networks
7.2 Training Neural Networks
7.3 Policy Evaluation with Neural Networks
7.4 Naive Deep Q-Learning
7.5 Deep Q-Learning with Experience Replay and Target
Network
7.6 DQN for Atari Games
7.7 Summary
References
8 Improvements to DQN
8.1 DQN with Double Q-Learning
8.2 Prioritized Experience Replay
8.3 Advantage function and Dueling Network Architecture
8.4 Summary
References
Part III Policy Approximation
9 Policy Gradient Methods
9.1 Policy-Based Methods
9.2 Policy Gradient
9.3 REINFORCE
9.4 REINFORCE with Baseline
9.5 Actor-Critic
9.6 Using Entropy to Encourage Exploration
9.7 Summary
References
10 Problems with Continuous Action Space
10.1 The Challenges of Problems with Continuous Action Space
10.2 MuJoCo Environments
10.3 Policy Gradient for Problems with Continuous Action
Space
10.4 Summary
References
11 Advanced Policy Gradient Methods
11.1 Problems with the Standard Policy Gradient Methods
11.2 Policy Performance Bounds
11.3 Proximal Policy Optimization
11.4 Summary
References
Part IV Advanced Topics
12 Distributed Reinforcement Learning
12.1 Why Use Distributed Reinforcement Learning
12.2 General Distributed Reinforcement Learning Architecture
12.3 Data Parallelism for Distributed Reinforcement Learning
12.4 Summary
References
13 Curiosity-Driven Exploration
13.1 Hard-to-Explore Problems vs.Sparse Reward Problems
13.2 Curiosity-Driven Exploration
13.3 Random Network Distillation
13.4 Summary
References
14 Planning with a Model:AlphaZero
14.1 Why We Need to Plan in Reinforcement Learning
14.2 Monte Carlo Tree Search
14.3 AlphaZero
14.4 Training AlphaZero on a 9 × 9 Go Board
14.5 Training AlphaZero on a 13 × 13 Gomoku Board
14.6 Summary
References
Index
About the Author
Michael Hu
is an exceptional software engineer with a wealth of
expertise spanning over a decade, specializing in the
design and implementation of enterprise-level
applications. His current focus revolves around leveraging
the power of machine learning (ML) and artificial
intelligence (AI) to revolutionize operational systems
within enterprises. A true coding enthusiast, Michael finds
solace in the realms of mathematics and continuously
explores cutting-edge technologies, particularly machine learning and
deep learning. His unwavering passion lies in the realm of deep
reinforcement learning, where he constantly seeks to push the
boundaries of knowledge. Demonstrating his commitment to the field,
he has built various numerous open source projects on GitHub that
closely emulate state-of-the-art reinforcement learning algorithms
pioneered by DeepMind, including notable examples like AlphaZero,
MuZero, and Agent57. Through these projects, Michael demonstrates
his commitment to advancing the field and sharing his knowledge with
fellow enthusiasts. He currently resides in the city of Shanghai, China.
About the Technical Reviewer
Shovon Sengupta
has over 14 years of expertise and a deepened
understanding of advanced predictive analytics, machine
learning, deep learning, and reinforcement learning. He
has established a place for himself by creating innovative
financial solutions that have won numerous awards. He is
currently working for one of the leading multinational
financial services corporations in the United States as the
Principal Data Scientist at the AI Center of Excellence. His job entails
leading innovative initiatives that rely on artificial intelligence to
address challenging business problems. He has a US patent (United
States Patent: Sengupta et al.: Automated Predictive Call Routing Using
Reinforcement Learning [US 10,356,244 B1]) to his credit. He is also a
Ph.D. scholar at BITS Pilani. He has reviewed quite a few popular titles
from leading publishers like Packt and Apress and has also authored a
few courses for Packt and CodeRed (EC-Council) in the realm of
machine learning. Apart from that, he has presented at various
international conferences on machine learning, time series forecasting,
and building trustworthy AI. His primary research is concentrated on
deep reinforcement learning, deep learning, natural language
processing (NLP), knowledge graph, causality analysis, and time series
analysis. For more details about Shovon’s work, please check out his
LinkedIn page: www.linkedin.com/in/shovon-sengupta-272aa917.
Part I
Foundation
© The Author(s), under exclusive license to APress Media, LLC, part of Springer
Nature 2023
M. Hu, The Art of Reinforcement Learning
https://doi.org/10.1007/978-1-4842-9606-6_1

1. Introduction
Michael Hu1
(1) Shanghai, Shanghai, China

Artificial intelligence has made impressive progress in recent years,

with breakthroughs achieved in areas such as image recognition,
natural language processing, and playing games. In particular,
reinforcement learning, a type of machine learning that focuses on
learning by interacting with an environment, has led to remarkable
achievements in the field.
In this book, we focus on the combination of reinforcement learning
and deep neural networks, which have become central to the success of
agents that can master complex games such as board game Go and Atari
video games.
This first chapter provides an overview of reinforcement learning,
including key concepts such as states, rewards, policies, and the
common terms used in reinforcement learning, like the difference
between episodic and continuing reinforcement learning problems,
model-free vs. model-based methods.
Despite the impressive progress in the field, reinforcement learning
still faces significant challenges. For example, it can be difficult to learn
from sparse rewards, and the methods can suffer from instability.
Additionally, scaling to large state and action spaces can be a challenge.
Throughout this book, we will explore these concepts in greater
detail and discuss state-of-the-art techniques used to address these
challenges. By the end of this book, you will have a comprehensive
understanding of the principles of reinforcement learning and how they
can be applied to real-world problems.
We hope this introduction has sparked your curiosity about the
potential of reinforcement learning, and we invite you to join us on this
journey of discovery.

Fig. 1.1 A DQN agent learning to play Atari’s Breakout. The goal of the game is to
use a paddle to bounce a ball up and break through a wall of bricks. The agent only
takes in the raw pixels from the screen, and it has to figure out what’s the right action
to take in order to maximize the score. Idea adapted from Mnih et al. [1]. Game
owned by Atari Interactive, Inc.

1.1 AI Breakthrough in Games

Atari
The Atari 2600 is a home video game console developed by Atari
Interactive, Inc. in the 1970s. It features a collection of iconic video
games. These games, such as Pong, Breakout, Space Invaders, and Pac-
Man, have become classic examples of early video gaming culture. In
this platform, players can interact with these classic games using a
joystick controller.
The breakthrough in Atari games came in 2015 when Mnih et al. [1]
from DeepMind developed an AI agent called DQN to play a list of Atari
video games, some even better than humans.
What makes the DQN agent so influential is how it was trained to
play the game. Similar to a human player, the agent was only given the
raw pixel image of the screen as inputs, as illustrated in Fig. 1.1, and it
has to figure out the rules of the game all by itself and decide what to do
during the game to maximize the score. No human expert knowledge,
such as predefined rules or sample games of human play, was given to
the agent.
The DQN agent is a type of reinforcement learning agent that learns
by interacting with an environment and receiving a reward signal. In
the case of Atari games, the DQN agent receives a score for each action
it takes.
Mnih et al. [1] trained and tested their DQN agents on 57 Atari video
games. They trained one DQN agent for one Atari game, with each agent
playing only the game it was trained on; the training was over millions
of frames. The DQN agent can play half of the games (30 of 57 games) at
or better than a human player, as shown by Mnih et al. [1]. This means
that the agent was able to learn and develop strategies that were better
than what a human player could come up with.
Since then, various organizations and researchers have made
improvements to the DQN agent, incorporating several new techniques.
The Atari video games have become one of the most used test beds for
evaluating the performance of reinforcement learning agents and
algorithms. The Arcade Learning Environment (ALE) [2], which
provides an interface to hundreds of Atari 2600 game environments, is
commonly used by researchers for training and testing reinforcement
learning agents.
In summary, the Atari video games have become a classic example
of early video gaming culture, and the Atari 2600 platform provides a
rich environment for training agents in the field of reinforcement
learning. The breakthrough of DeepMind’s DQN agent, trained and
tested on 57 Atari video games, demonstrated the capability of an AI
agent to learn and make decisions through trial-and-error interactions
with classic games. This breakthrough has spurred many
improvements and advancements in the field of reinforcement learning,
and the Atari games have become a popular test bed for evaluating the
performance of reinforcement learning algorithms.

Go
Go is an ancient Chinese strategy board game played by two players,
who take turns laying pieces of stones on a 19x19 board with the goal
of surrounding more territory than the opponent. Each player has a set
of black or white stones, and the game begins with an empty board.
Players alternate placing stones on the board, with the black player
going first.

Fig. 1.2 Yoda Norimoto (black) vs. Kiyonari Tetsuya (white), Go game from the 66th
NHK Cup, 2018. White won by 0.5 points. Game record from CWI [4]
The stones are placed on the intersections of the lines on the board,
rather than in the squares. Once a stone is placed on the board, it
cannot be moved, but it can be captured by the opponent if it is
completely surrounded by their stones. Stones that are surrounded and
captured are removed from the board.
The game continues until both players pass, at which point the
territory on the board is counted. A player’s territory is the set of empty
intersections that are completely surrounded by their stones, plus any
captured stones. The player with the larger territory wins the game. In
the case of the final board position shown in Fig. 1.2, the white won by
0.5 points.
Although the rules of the game are relatively simple, the game is
extremely complex. For instance, the number of legal board positions in
Go is enormously large compared to Chess. According to research by
Tromp and Farnebä ck [3], the number of legal board positions in Go is
approximately , which is vastly greater than the number of
atoms in the universe.
This complexity presents a significant challenge for artificial
intelligence (AI) agents that attempt to play Go. In March 2016, an AI
agent called AlphaGo developed by Silver et al. [5] from DeepMind
made history by beating the legendary Korean player Lee Sedol with a
score of 4-1 in Go. Lee Sedol is a winner of 18 world titles and is
considered one of the greatest Go player of the past decade. AlphaGo’s
victory was remarkable because it used a combination of deep neural
networks and tree search algorithms, as well as the technique of
reinforcement learning.
AlphaGo was trained using a combination of supervised learning
from human expert games and reinforcement learning from games of
self-play. This training enabled the agent to develop creative and
innovative moves that surprised both Lee Sedol and the Go community.
The success of AlphaGo has sparked renewed interest in the field of
reinforcement learning and has demonstrated the potential for AI to
solve complex problems that were once thought to be the exclusive
domain of human intelligence. One year later, Silver et al. [6] from
DeepMind introduced a new and more powerful agent, AlphaGo Zero.
AlphaGo Zero was trained using pure self-play, without any human
expert moves in its training, achieving a higher level of play than the
previous AlphaGo agent. They also made other improvements like
simplifying the training processes.
To evaluate the performance of the new agent, they set it to play
games against the exact same AlphaGo agent that beat the world
champion Lee Sedol in 2016, and this time the new AlphaGo Zero beats
AlphaGo with score 100-0.
In the following year, Schrittwieser et al. [7] from DeepMind
generalized the AlphaGo Zero agent to play not only Go but also other
board games like Chess and Shogi (Japanese chess), and they called this
generalized agent AlphaZero. AlphaZero is a more general
reinforcement learning algorithm that can be applied to a variety of
board games, not just Go, Chess, and Shogi.
Reinforcement learning is a type of machine learning in which an
agent learns to make decisions based on the feedback it receives from
its environment. Both DQN and AlphaGo (and its successor) agents use
this technique, and their achievements are very impressive. Although
these agents are designed to play games, this does not mean that
reinforcement learning is only capable of playing games. In fact, there
are many more challenging problems in the real world, such as
navigating a robot, driving an autonomous car, and automating web
advertising. Games are relatively easy to simulate and implement
compared to these other real-world problems, but reinforcement
learning has the potential to be applied to a wide range of complex
challenges beyond game playing.

1.2 What Is Reinforcement Learning

In computer science, reinforcement learning is a subfield of machine
learning that focuses on learning how to act in a world or an
environment. The goal of reinforcement learning is for an agent to learn
from interacting with the environment in order to make a sequence of
decisions that maximize accumulated reward in the long run. This
process is known as goal-directed learning.
Unlike other machine learning approaches like supervised learning,
reinforcement learning does not rely on labeled data to learn from.
Instead, the agent must learn through trial and error, without being
directly told the rules of the environment or what action to take at any
given moment. This makes reinforcement learning a powerful tool for
modeling and solving real-world problems where the rules and optimal
actions may not be known or easily determined.
Reinforcement learning is not limited to computer science, however.
Similar ideas are studied in other fields under different names, such as
operations research and optimal control in engineering. While the
specific methods and details may vary, the underlying principles of
goal-directed learning and decision-making are the same.
Examples of reinforcement learning in the real world are all around
us. Human beings, for example, are naturally good at learning from
interacting with the world around us. From learning to walk as a baby
to learning to speak our native language to learning to drive a car, we
learn through trial and error and by receiving feedback from the
environment. Similarly, animals can also be trained to perform a variety
of tasks through a process similar to reinforcement learning. For
instance, service dogs can be trained to assist individuals in
wheelchairs, while police dogs can be trained to help search for missing
people.
One vivid example that illustrates the idea of reinforcement learning
is a video of a dog with a big stick in its mouth trying to cross a narrow
bridge.1 The video shows the dog attempting to pass the bridge, but
failing multiple times. However, after some trial and error, the dog
eventually discovers that by tilting its head, it can pass the bridge with
its favorite stick. This simple example demonstrates the power of
reinforcement learning in solving complex problems by learning from
the environment through trial and error.

1.3 Agent-Environment in Reinforcement Learning

Reinforcement learning is a type of machine learning that focuses on
how an agent can learn to make optimal decisions by interacting with
an environment. The agent-environment loop is the core of
reinforcement learning, as shown in Fig. 1.3. In this loop, the agent
observes the state of the environment and a reward signal, takes an
action, and receives a new state and reward signal from the
environment. This process continues iteratively, with the agent learning
from the rewards it receives and adjusting its actions to maximize
future rewards.

Environment
The environment is the world in which the agent operates. It can be a
physical system, such as a robot navigating a maze, or a virtual
environment, such as a game or a simulation. The environment
provides the agent with two pieces of information: the state of the
environment and a reward signal. The state describes the relevant
information about the environment that the agent needs to make a
decision, such as the position of the robot or the cards in a poker game.
The reward signal is a scalar value that indicates how well the agent is
doing in its task. The agent’s objective is to maximize its cumulative
reward over time.

Fig. 1.3 Top: Agent-environment in reinforcement learning in a loop. Bottom: The

loop unrolled by time

The environment has its own set of rules, which determine how the
state and reward signal change based on the agent’s actions. These
rules are often called the dynamics of the environment. In many cases,
the agent does not have access to the underlying dynamics of the
environment and must learn them through trial and error. This is
similar to how we humans interact with the physical world every day,
normally we have a pretty good sense of what’s going on around us, but
it’s difficult to fully understand the dynamics of the universe.
Game environments are a popular choice for reinforcement learning
because they provide a clear objective and well-defined rules. For
example, a reinforcement learning agent could learn to play the game of
Pong by observing the screen and receiving a reward signal based on
whether it wins or loses the game.
In a robotic environment, the agent is a robot that must learn to
navigate a physical space or perform a task. For example, a
reinforcement learning agent could learn to navigate a maze by using
sensors to detect its surroundings and receiving a reward signal based
on how quickly it reaches the end of the maze.

State
In reinforcement learning, an environment state or simply state is the
statistical data provided by the environment to represent the current
state of the environment. The state can be discrete or continuous. For
instance, when driving a stick shift car, the speed of the car is a
continuous variable, while the current gear is a discrete variable.
Ideally, the environment state should contain all relevant
information that’s necessary for the agent to make decisions. For
example, in a single-player video game like Breakout, the pixels of
frames of the game contain all the information necessary for the agent
to make a decision. Similarly, in an autonomous driving scenario, the
sensor data from the car’s cameras, lidar, and other sensors provide
relevant information about the surrounding environment.
However, in practice, the available information may depend on the
task and domain. In a two-player board game like Go, for instance,
although we have perfect information about the board position, we
don’t have perfect knowledge about the opponent player, such as what
they are thinking in their head or what their next move will be. This
makes the state representation more challenging in such scenarios.
Furthermore, the environment state might also include noisy data.
For example, a reinforcement learning agent driving an autonomous car
might use multiple cameras at different angles to capture images of the
surrounding area. Suppose the car is driving near a park on a windy
day. In that case, the onboard cameras could also capture images of
some trees in the park that are swaying in the wind. Since the
movement of these trees should not affect the agent’s ability to drive,
because the trees are inside the park and not on the road or near the
road, we can consider these movements of the trees as noise to the self-
driving agent. However, it can be challenging to ignore them from the
captured images. To tackle this problem, researchers might use various
techniques such as filtering and smoothing to eliminate the noisy data
and obtain a cleaner representation of the environment state.

Reward
In reinforcement learning, the reward signal is a numerical value that
the environment provides to the agent after the agent takes some
action. The reward can be any numerical value, positive, negative, or
zero. However, in practice, the reward function often varies from task to
task, and we need to carefully design a reward function that is specific
to our reinforcement learning problem.
Designing an appropriate reward function is crucial for the success
of the agent. The reward function should be designed to encourage the
agent to take actions that will ultimately lead to achieving our desired
goal. For example, in the game of Go, the reward is 0 at every step
before the game is over, and +1 or if the agent wins or loses the
game, respectively. This design incentivizes the agent to win the game,
without explicitly telling it how to win.
Similarly, in the game of Breakout, the reward can be a positive
number if the agent destroys some bricks negative number if the agent
failed to catch the ball, and zero reward otherwise. This design
incentivizes the agent to destroy as many bricks as possible while
avoiding losing the ball, without explicitly telling it how to achieve a
high score.
The reward function plays a crucial role in the reinforcement
learning process. The goal of the agent is to maximize the accumulated
rewards over time. By optimizing the reward function, we can guide the
agent to learn a policy that will achieve our desired goal. Without the
reward signal, the agent would not know what the goal is and would
not be able to learn effectively.
In summary, the reward signal is a key component of reinforcement
learning that incentivizes the agent to take actions that ultimately lead
to achieving the desired goal. By carefully designing the reward
function, we can guide the agent to learn an optimal policy.

Agent
In reinforcement learning, an agent is an entity that interacts with an
environment by making decisions based on the received state and
reward signal from the environment. The agent’s goal is to maximize its
cumulative reward in the long run. The agent must learn to make the
best decisions by trial and error, which involves exploring different
actions and observing the resulting rewards.
In addition to the external interactions with the environment, the
agent may also has its internal state represents its knowledge about the
world. This internal state can include things like memory of past
experiences and learned strategies.
It’s important to distinguish the agent’s internal state from the
environment state. The environment state represents the current state
of the world that the agent is trying to influence through its actions. The
agent, however, has no direct control over the environment state. It can
only affect the environment state by taking actions and observing the
resulting changes in the environment. For example, if the agent is
playing a game, the environment state might include the current
positions of game pieces, while the agent’s internal state might include
the memory of past moves and the strategies it has learned.
In this book, we will typically use the term “state” to refer to the
environment state. However, it’s important to keep in mind the
distinction between the agent’s internal state and the environment
state. By understanding the role of the agent and its interactions with
the environment, we can better understand the principles behind
reinforcement learning algorithms. It is worth noting that the terms
“agent” and “algorithm” are frequently used interchangeably in this
book, particularly in later chapters.

Action
In reinforcement learning, the agent interacts with an environment by
selecting actions that affect the state of the environment. Actions are
chosen from a predefined set of possibilities, which are specific to each
problem. For example, in the game of Breakout, the agent can choose to
move the paddle to the left or right or take no action. It cannot perform
actions like jumping or rolling over. In contrast, in the game of Pong, the
agent can choose to move the paddle up or down but not left or right.
The chosen action affects the future state of the environment. The
agent’s current action may have long-term consequences, meaning that
it will affect the environment’s states and rewards for many future time
steps, not just the next immediate stage of the process.
Actions can be either discrete or continuous. In problems with
discrete actions, the set of possible actions is finite and well defined.
Examples of such problems include Atari and Go board games. In
contrast, problems with continuous actions have an infinite set of
possible actions, often within a continuous range of values. An example
of a problem with continuous actions is robotic control, where the
degree of angle movement of a robot arm is often a continuous action.
Reinforcement learning problems with discrete actions are
generally easier to solve than those with continuous actions. Therefore,
this book will focus on solving reinforcement learning problems with
discrete actions. However, many of the concepts and techniques
discussed in this book can be applied to problems with continuous
actions as well.

Policy
A policy is a key concept in reinforcement learning that defines the
behavior of an agent. In particular, it maps each possible state in the
environment to the probabilities of chose different actions. By
specifying how the agent should behave, a policy guides the agent to
interact with its environment and maximize its cumulative reward. We
will delve into the details of policies and how they interact with the
MDP framework in Chap. 2.
For example, suppose an agent is navigating a grid-world
environment. A simple policy might dictate that the agent should
always move to the right until it reaches the goal location. Alternatively,
a more sophisticated policy could specify that the agent should choose
its actions based on its current position and the probabilities of moving
to different neighboring states.

Model
In reinforcement learning, a model refers to a mathematical description
of the dynamics function and reward function of the environment. The
dynamics function describes how the environment evolves from one
state to another, while the reward function specifies the reward that the
agent receives for taking certain actions in certain states.
In many cases, the agent does not have access to a perfect model of
the environment. This makes learning a good policy challenging, since
the agent must learn from experience how to interact with the
environment to maximize its reward. However, there are some cases
where a perfect model is available. For example, if the agent is playing a
game with fixed rules and known outcomes, the agent can use this
knowledge to select its actions strategically. We will explore this
scenario in detail in Chap. 2.
In reinforcement learning, the agent-environment boundary can be
ambiguous. Despite a house cleaning robot appearing to be a single
agent, the agent’s direct control typically defines its boundary, while the
remaining components comprise the environment. In this case, the
robot’s wheels and other hardwares are considered to be part of the
environment since they aren’t directly controlled by the agent. We can
think of the robot as a complex system composed of several parts, such
as hardware, software, and the reinforcement learning agent, which can
control the robot’s movement by signaling the software interface, which
then communicates with microchips to manage the wheel movement.

1.4 Examples of Reinforcement Learning

Reinforcement learning is a versatile technique that can be applied to a
variety of real-world problems. While its success in playing games is
well known, there are many other areas where it can be used as an
effective solution. In this section, we explore a few examples of how
reinforcement learning can be applied to real-world problems.

Autonomous Driving
Reinforcement learning can be used to train autonomous vehicles to
navigate complex and unpredictable environments. The goal for the
agent is to safely and efficiently drive the vehicle to a desired location
while adhering to traffic rules and regulations. The reward signal could
be a positive number for successful arrival at the destination within a
specified time frame and a negative number for any accidents or
violations of traffic rules. The environment state could contain
information about the vehicle’s location, velocity, and orientation, as
well as sensory data such as camera feeds and radar readings.
Additionally, the state could include the current traffic conditions and
weather, which would help the agent to make better decisions while
driving.

Navigating Robots in a Factory Floor

One practical application of reinforcement learning is to train robots to
navigate a factory floor. The goal for the agent is to safely and efficiently
transport goods from one point to another without disrupting the work
of human employees or other robots. In this case, the reward signal
could be a positive number for successful delivery within a specified
time frame and a negative number for any accidents or damages
caused. The environment state could contain information about the
robot’s location, the weight and size of the goods being transported, the
location of other robots, and sensory data such as camera feeds and
battery level. Additionally, the state could include information about the
production schedule, which would help the agent to prioritize its tasks.

Automating Web Advertising

Another application of reinforcement learning is to automate web
advertising. The goal for the agent is to select the most effective type of
ad to display to a user, based on their browsing history and profile. The
reward signal could be a positive number for when the user clicks on
the ad, and zero otherwise. The environment state could contain
information such as the user’s search history, demographics, and
current trends on the Internet. Additionally, the state could include
information about the context of the web page, which would help the
agent to choose the most relevant ad.
Video Compression
Reinforcement learning can also be used to improve video
compression. DeepMind’s MuZero agent has been adapted to optimize
video compression for some YouTube videos. In this case, the goal for
the agent is to compress the video as much as possible without
compromising the quality. The reward signal could be a positive
number for high-quality compression and a negative number for low-
quality compression. The environment state could contain information
such as the video’s resolution, bit rate, frame rate, and the complexity of
the scenes. Additionally, the state could include information about the
viewing device, which would help the agent to optimize the
compression for the specific device.
Overall, reinforcement learning has enormous potential for solving
real-world problems in various industries. The key to successful
implementation is to carefully design the reward signal and the
environment state to reflect the specific goals and constraints of the
problem. Additionally, it is important to continually monitor and
evaluate the performance of the agent to ensure that it is making the
best decisions.

1.5 Common Terms in Reinforcement Learning

Episodic vs. Continuing Tasks
In reinforcement learning, the type of problem or task is categorized as
episodic or continuing depending on whether it has a natural ending.
Natural ending refers to a point in a task or problem where it is
reasonable to consider the task or problem is completed.
Episodic problems have a natural termination point or terminal
state, at which point the task is over, and a new episode starts. A new
episode is independent of previous episodes. Examples of episodic
problems include playing an Atari video game, where the game is over
when the agent loses all lives or won the game, and a new episode
always starts when we reset the environment, regardless of whether
the agent won or lost the previous game. Other examples of episodic
problems include Tic-Tac-Toe, chess, or Go games, where each game is
independent of the previous game.
On the other hand, continuing problems do not have a natural
endpoint, and the process could go on indefinitely. Examples of
continuing problems include personalized advertising or
recommendation systems, where the agent’s goal is to maximize a
user’s satisfaction or click-through rate over an indefinite period.
Another example of a continuing problem is automated stock trading,
where the agent wants to maximize their profits in the stock market by
buying and selling stocks. In this scenario, the agent’s actions, such as
the stocks they buy and the timing of their trades, can influence the
future prices and thus affect their future profits. The agent’s goal is to
maximize their long-term profits by making trades continuously, and
the stock prices will continue to fluctuate in the future. Thus, the
agent’s past trades will affect their future decisions, and there is no
natural termination point for the problem.
It is possible to design some continuing reinforcement learning
problems as episodic by using a time-constrained approach. For
example, the episode could be over when the market is closed.
However, in this book, we only consider natural episodic problems that
is, the problems with natural termination.
Understanding the differences between episodic and continuing
problems is crucial for designing effective reinforcement learning
algorithms for various applications. For example, episodic problems
may require a different algorithmic approach than continuing problems
due to the differences in their termination conditions. Furthermore, in
real-world scenarios, distinguishing between episodic and continuing
problems can help identify the most appropriate reinforcement
learning approach to use for a particular task or problem.

Deterministic vs. Stochastic Tasks

In reinforcement learning, it is important to distinguish between
deterministic and stochastic problems. A problem is deterministic if the
outcome is always the same when the agent takes the same action in
the same environment state. For example, Atari video game or a game
of Go is a deterministic problem. In these games, the rules of the game
are fixed; when the agent repeatedly takes the same action under the
same environment condition (state), the outcome (reward signal and
next state) is always the same.
The reason that these games are considered deterministic is that
the environment’s dynamics and reward functions do not change over
time. The rules of the game are fixed, and the environment always
behaves in the same way for a given set of actions and states. This
allows the agent to learn a policy that maximizes the expected reward
by simply observing the outcomes of its actions.
On the other hand, a problem is stochastic if the outcome is not
always the same when the agent takes the same action in the same
environment state. One example of a stochastic environment is playing
poker. The outcome of a particular hand is not entirely determined by
the actions of the player. Other players at the table can also take actions
that influence the outcome of the hand. Additionally, the cards dealt to
each player are determined by a shuffled deck, which introduces an
element of chance into the game.
For example, let’s say a player is dealt a pair of aces in a game of
Texas hold’em. The player might decide to raise the bet, hoping to win a
large pot. However, the other players at the table also have their own
sets of cards and can make their own decisions based on the cards they
hold and the actions of the other players.
If another player has a pair of kings, they might also decide to raise
the bet, hoping to win the pot. If a third player has a pair of twos, they
might decide to fold, as their hand is unlikely to win. The outcome of
the hand depends not only on the actions of the player with the pair of
aces but also on the actions of the other players at the table, as well as
the cards dealt to them.
This uncertainty and complexity make poker a stochastic problem.
While it is possible to use various strategies to improve one’s chances
of winning in poker, the outcome of any given hand is never certain, and
a skilled player must be able to adjust their strategy based on the
actions of the other players and the cards dealt.
Another example of stochastic environment is the stock market. The
stock market is a stochastic environment because the outcome of an
investment is not always the same when the same action is taken in the
same environment state. There are many factors that can influence the
price of a stock, such as company performance, economic conditions,
geopolitical events, and investor sentiment. These factors are
constantly changing and can be difficult to predict, making it impossible
to know with certainty what the outcome of an investment will be.
For example, let’s say you decide to invest in a particular stock
because you believe that the company is undervalued and has strong
growth prospects. You buy 100 shares at a price of $145.0 per share.
However, the next day, the company announces that it has lost a major
customer and its revenue projections for the next quarter are lower
than expected. The stock price drops to $135.0 per share, and you have
lost $1000 on your investment. It’s most likely the stochastic nature of
the environment led to the loss outcome other than the action (buying
100 shares).
While it is possible to use statistical analysis and other tools to try
to predict stock price movements, there is always a level of uncertainty
and risk involved in investing in the stock market. This uncertainty and
risk are what make the stock market a stochastic environment, and why
it is important to use appropriate risk management techniques when
making investment decisions.
In this book, we focus on deterministic reinforcement learning
problems. By understanding the fundamentals of deterministic
reinforcement learning, readers will be well equipped to tackle more
complex and challenging problems in the future.

Model-Free vs. Model-Based Reinforcement

Learning
In reinforcement learning, an environment is a system in which an
agent interacts with in order to achieve a goal. A model is a
mathematical representation of the environment’s dynamics and
reward functions. Model-free reinforcement learning means the agent
does not use the model of the environment to help it make decisions.
This may occur because either the agent lacks access to the accurate
model of the environment or the model is too complex to use during
decision-making.
In model-free reinforcement learning, the agent learns to take
actions based on its experiences of the environment without explicitly
simulating future outcomes. Examples of model-free reinforcement
learning methods include Q-learning, SARSA (State-Action-Reward-
State-Action), and deep reinforcement learning algorithms such as
DQN, which we’ll introduce later in the book.
On the other hand, in model-based reinforcement learning, the
agent uses a model of the environment to simulate future outcomes and
plan its actions accordingly. This may involve constructing a complete
model of the environment or using a simplified model that captures
only the most essential aspects of the environment’s dynamics. Model-
based reinforcement learning can be more sample-efficient than model-
free methods in certain scenarios, especially when the environment is
relatively simple and the model is accurate. Examples of model-based
reinforcement learning methods include dynamic programming
algorithms, such as value iteration and policy iteration, and
probabilistic planning methods, such as Monte Carlo Tree Search in
AlphaZero agent, which we’ll introduce later in the book.
In summary, model-free and model-based reinforcement learning
are two different approaches to solving the same problem of
maximizing rewards in an environment. The choice between these
approaches depends on the properties of the environment, the
available data, and the computational resources.

1.6 Why Study Reinforcement Learning

Machine learning is a vast and rapidly evolving field, with many
different approaches and techniques. As such, it can be challenging for
practitioners to know which type of machine learning to use for a given
problem. By discussing the strengths and limitations of different
branches of machine learning, we can better understand which
approach might be best suited to a particular task. This can help us
make more informed decisions when developing machine learning
solutions and ultimately lead to more effective and efficient systems.
There are three branches of machine learning. One of the most
popular and widely adopted in the real world is supervised learning,
which is used in domains like image recognition, speech recognition,
and text classification. The idea of supervised learning is very simple:
given a set of training data and the corresponding labels, the objective
is for the system to generalize and predict the label for data that’s not
present in the training dataset. These training labels are typically
provided by some supervisors (e.g., humans). Hence, we’ve got the
name supervised learning.
Another branch of machine learning is unsupervised learning. In
unsupervised learning, the objective is to discover the hidden
structures or features of the training data without being provided with
any labels. This can be useful in domains such as image clustering,
where we want the system to group similar images together without
knowing ahead of time which images belong to which group. Another
application of unsupervised learning is in dimensionality reduction,
where we want to represent high-dimensional data in a lower-
dimensional space while preserving as much information as possible.
Reinforcement learning is a type of machine learning in which an
agent learns to take actions in an environment in order to maximize a
reward signal. It’s particularly useful in domains where there is no clear
notion of “correct” output, such as in robotics or game playing.
Reinforcement learning has potential applications in areas like robotics,
healthcare, and finance.
Supervised learning has already been widely used in computer
vision and natural language processing. For example, the ImageNet
classification challenge is an annual computer vision competition
where deep convolutional neural networks (CNNs) dominate. The
challenge provides a training dataset with labels for 1.2 million images
across 1000 categories, and the goal is to predict the labels for a
separate evaluation dataset of about 100,000 images. In 2012,
Krizhevsky et al. [8] developed AlexNet, the first deep CNN system used
in this challenge. AlexNet achieved an 18% improvement in accuracy
compared to previous state-of-the-art methods, which marked a major
breakthrough in computer vision.
Since the advent of AlexNet, almost all leading solutions to the
ImageNet challenge have been based on deep CNNs. Another
breakthrough came in 2015 when researchers He et al. [9] from
Microsoft developed ResNet, a new architecture designed to improve
the training of very deep CNNs with hundreds of layers. Training deep
CNNs is challenging due to vanishing gradients, which makes it difficult
to propagate the gradients backward through the network during
backpropagation. ResNet addressed this challenge by introducing skip
connections, which allowed the network to bypass one or more layers
during forward propagation, thereby reducing the depth of the network
that the gradients have to propagate through.
While supervised learning is capable of discovering hidden patterns
and features from data, it is limited in that it merely mimics what it is
told to do during training and cannot interact with the world and learn
from its own experience. One limitation of supervised learning is the
need to label every possible stage of the process. For example, if we
want to use supervised learning to train an agent to play Go, then we
would need to collect the labels for every possible board position,
which is impossible due to the enormous number of possible
combinations. Similarly, in Atari video games, a single pixel change
would require relabeling, making supervised learning inapplicable in
these cases. However, supervised learning has been successful in many
other applications, such as language translation and image
classification.
Unsupervised learning tries to discover hidden patterns or features
without labels, but its objective is completely different from that of RL,
which is to maximize accumulated reward signals. Humans and animals
learn by interacting with their environment, and this is where
reinforcement learning (RL) comes in. In RL, the agent is not told which
action is good or bad, but rather it must discover that for itself through
trial and error. This trial-and-error search process is unique to RL.
However, there are other challenges that are unique to RL, such as
dealing with delayed consequences and balancing exploration and
exploitation.
While RL is a distinct branch of machine learning, it shares some
commonalities with other branches, such as supervised and
unsupervised learning. For example, improvements in supervised
learning and deep convolutional neural networks (CNNs) have been
adapted to DeepMind’s DQN, AlphaGo, and other RL agents. Similarly,
unsupervised learning can be used to pretrain the weights of RL agents
to improve their performance. Furthermore, many of the mathematical
concepts used in RL, such as optimization and how to train a neural
network, are shared with other branches of machine learning.
Therefore, while RL has unique challenges and applications, it also
benefits from and contributes to the development of other branches of
machine learning.
1.7 The Challenges in Reinforcement Learning
Reinforcement learning (RL) is a type of machine learning in which an
agent learns to interact with an environment to maximize some notion
of cumulative reward. While RL has shown great promise in a variety of
applications, it also comes with several common challenges, as
discussed in the following sections:

Exploration vs. Exploitation Dilemma

The exploration-exploitation dilemma refers to the fundamental
challenge in reinforcement learning of balancing the need to explore
the environment to learn more about it with the need to exploit
previous knowledge to maximize cumulative reward. The agent must
continually search for new actions that may yield greater rewards while
also taking advantage of actions that have already proven successful.
In the initial exploration phase, the agent is uncertain about the
environment and must try out a variety of actions to gather information
about how the environment responds. This is similar to how humans
might learn to play a new video game by trying different button
combinations to see what happens. However, as the agent learns more
about the environment, it becomes increasingly important to focus on
exploiting the knowledge it has gained in order to maximize cumulative
reward. This is similar to how a human might learn to play a game more
effectively by focusing on the actions that have already produced high
scores.
There are many strategies for addressing the exploration-
exploitation trade-off, ranging from simple heuristics to more
sophisticated algorithms. One common approach is to use an -greedy
policy, in which the agent selects the action with the highest estimated
value with probability and selects a random action with
probability in order to encourage further exploration. Another
approach is to use a Thompson sampling algorithm, which balances
exploration and exploitation by selecting actions based on a
probabilistic estimate of their expected value.
It is important to note that the exploration-exploitation trade-off is
not a simple problem to solve, and the optimal balance between
exploration and exploitation will depend on many factors, including the
complexity of the environment and the agent’s prior knowledge. As a
result, there is ongoing research in the field of reinforcement learning
aimed at developing more effective strategies for addressing this
challenge.

Credit Assignment Problem

In reinforcement learning (RL), the credit assignment problem refers to
the challenge of determining which actions an agent took that led to a
particular reward. This is a fundamental problem in RL because the
agent must learn from its own experiences in order to improve its
performance.
To illustrate this challenge, let’s consider the game of Tic-Tac-Toe,
where two players take turns placing Xs and Os on a 3 3 grid until
one player gets three in a row. Suppose the agent is trying to learn to
play Tic-Tac-Toe using RL, and the reward is +1 for a win, for a loss,
and 0 for a draw. The agent’s goal is to learn a policy that maximizes its
cumulative reward.
Now, suppose the agent wins a game of Tic-Tac-Toe. How can the
agent assign credit to the actions that led to the win? This can be a
difficult problem to solve, especially if the agent is playing against
another RL agent that is also learning and adapting its strategies.
To tackle the credit assignment problem in RL, there are various
techniques that can be used, such as Monte Carlo methods or temporal
difference learning. These methods use statistical analysis to estimate
the value of each action taken by the agent, based on the rewards
received and the states visited. By using these methods, the agent can
gradually learn to assign credit to the actions that contribute to its
success and adjust its policy accordingly.
In summary, credit assignment is a key challenge in reinforcement
learning, and it is essential to develop effective techniques for solving
this problem in order to achieve optimal performance.

Reward Engineering Problem

The reward engineering problem refers to the process of designing a
good reward function that encourages the desired behavior in a
reinforcement learning (RL) agent. The reward function determines
what the agent is trying to optimize, so it is crucial to make sure it
reflects the desired goal we want the the agent to achieve.
An example of good reward engineering is in the game of Atari
Breakout, where the goal of the agent is to clear all the bricks at the top
of the screen by bouncing a ball off a paddle. One way to design a
reward function for this game is to give the agent a positive reward for
each brick it clears and a negative reward for each time the ball passes
the paddle and goes out of bounds. However, this reward function alone
may not lead to optimal behavior, as the agent may learn to exploit a
loophole by simply bouncing the ball back and forth on the same side of
the screen without actually clearing any bricks.
To address this challenge, the reward function can be designed to
encourage more desirable behavior. For example, the reward function
can be modified to give the agent a larger positive reward for clearing
multiple bricks in a row or for clearing the bricks on the edges of the
screen first. This can encourage the agent to take more strategic shots
and aim for areas of the screen that will clear more bricks at once.
An example of bad reward engineering is the CoastRunners video
game in Atari; it’s a very simple boat racing game. The goal of the game
is to finish the boat race as quickly as possible. But there’s one small
issue with the game, the player can earn higher scores by hitting some
targets laid out along the route. There’s a video that shows that a
reinforcement learning agent plays the game by repeatedly hitting the
targets instead of finishing the race.2 This example should not be
viewed as the failure of the reinforcement learning agent, but rather
humans failed to design and use the correct reward function.
Overall, reward engineering is a crucial part of designing an
effective RL agent. A well-designed reward function can encourage the
desired behavior and lead to optimal performance, while a poorly
designed reward function can lead to suboptimal behavior and may
even encourage undesired behavior.

Generalization Problem
In reinforcement learning (RL), the generalization problem refers to the
ability of an agent to apply what it has learned to new and previously
unseen situations. To understand this concept, consider the example of
a self-driving car. Suppose the agent is trying to learn to navigate a
particular intersection, with a traffic light and crosswalk. The agent
receives rewards for reaching its destination quickly and safely, but it
must also follow traffic laws and avoid collisions with other vehicles
and pedestrians.
During training, the agent is exposed to a variety of situations at the
intersection, such as different traffic patterns and weather conditions.
It learns to associate certain actions with higher rewards, such as
slowing down at the yellow light and stopping at the red light. Over
time, the agent becomes more adept at navigating the intersection and
earns higher cumulative rewards.
However, when the agent is faced with a new intersection, with
different traffic patterns and weather conditions, it may struggle to
apply what it has learned. This is where generalization comes in. If the
agent has successfully generalized its knowledge, it will be able to
navigate the new intersection based on its past experiences, even
though it has not seen this exact intersection before. For example, it
may slow down at a yellow light, even if the timing is slightly different
than what it has seen before, or it may recognize a pedestrian crossing
and come to a stop, even if the appearance of the crosswalk is slightly
different.
If the agent has not successfully generalized its knowledge, it may
struggle to navigate the new intersection and may make mistakes that
lead to lower cumulative rewards. For example, it may miss a red light
or fail to recognize a pedestrian crossing, because it has only learned to
recognize these situations in a particular context.
Therefore, generalization is a crucial aspect of RL, as it allows the
agent to apply its past experiences to new and previously unseen
situations, which can improve its overall performance and make it more
robust to changes in the environment.

Sample Efficiency Problem

The sample efficiency problem in reinforcement learning refers to the
ability of an RL agent to learn an optimal policy with a limited number
of interactions with the environment. This can be challenging,
especially in complex environments where the agent may need to
explore a large state space or take a large number of actions to learn
the optimal policy.
To better understand sample efficiency, let’s consider an example of
an RL agent playing a game of Super Mario Bros. In this game, the agent
must navigate Mario through a series of levels while avoiding enemies
and obstacles, collecting coins, and reaching the flag at the end of each
level.
To learn how to play Super Mario Bros., the agent must interact with
the environment, taking actions such as moving left or right, jumping,
and shooting fireballs. Each action leads to a new state of the
environment, and the agent receives a reward based on its actions and
the resulting state.
For example, the agent may receive a reward for collecting a coin or
reaching the flag and a penalty for colliding with an enemy or falling
into a pit. By learning from these rewards, the agent can update its
policy to choose actions that lead to higher cumulative rewards over
time.
However, learning the optimal policy in Super Mario Bros. can be
challenging due to the large state space and the high dimensionality of
the input data, which includes the position of Mario, the enemies, and
the obstacles on the screen.
To address the challenge of sample efficiency, the agent may use a
variety of techniques to learn from a limited number of interactions
with the environment. For example, the agent may use function
approximation to estimate the value or policy function based on a small
set of training examples. The agent may also use off-policy learning,
which involves learning from data collected by a different policy than
the one being optimized.
Overall, sample efficiency is an important challenge in
reinforcement learning, especially in complex environments.
Techniques such as function approximation, off-policy learning, can
help address this challenge and enable RL agents to learn optimal
policies with a limited number of interactions with the environment.

1.8 Summary
In the first chapter of the book, readers were introduced to the concept
of reinforcement learning (RL) and its applications. The chapter began
by discussing the breakthroughs in AI in games, showcasing the success
of RL in complex games such as Go. The chapter then provided an
overview of the agent-environment interaction that forms the basis of
RL, including key concepts such as environment, agent, reward, state,
action, and policy. Several examples of RL were presented, including
Atari video game playing, board game Go, and robot control tasks.
Additionally, the chapter introduced common terms used in RL,
including episodic vs. continuing tasks, deterministic vs. stochastic
tasks, and model-free vs. model-based reinforcement learning. The
importance of studying RL was then discussed, including its potential to
solve complex problems and its relevance to real-world applications.
The challenges faced in RL, such as the exploration-exploitation
dilemma, the credit assignment problem, and the generalization
problem, were also explored.
The next chapter of the book will focus on Markov decision
processes (MDPs), which is a formal framework used to model RL
problems.

References
[1] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness,
Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg
Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen
King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis.
Human-level control through deep reinforcement learning. Nature,
518(7540):529–533, Feb 2015.
[Crossref]
[2]
M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning
environment: An evaluation platform for general agents. Journal of Artificial
Intelligence Research, 47:253–279, Jun 2013.
[Crossref]
[3]
John Tromp and Gunnar Farnebäck. Combinatorics of go. In H. Jaap van den
Herik, Paolo Ciancarini, and H. H. L. M. (Jeroen) Donkers, editors, Computers and
Games, pages 84–99, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.
[4]
CWI. 66th NHK Cup. https://homepages.c wi.nl/~aeb/go/games/games/N HK/
66/index.html, 2018.
[5]
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van
den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam,
Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner,
Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore
Graepel, and Demis Hassabis. Mastering the game of go with deep neural
networks and tree search. Nature, 529(7587):484–489, Jan 2016.
[Crossref]
[6]
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja
Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton,
Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche,
Thore Graepel, and Demis Hassabis. Mastering the game of go without human
knowledge. Nature, 550(7676):354–359, Oct 2017.
[Crossref]
[7]
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew
Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel,
Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. Mastering chess and
shogi by self-play with a general reinforcement learning algorithm, 2017.
[8]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification
with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and
K.Q. Weinberger, editors, Advances in Neural Information Processing Systems,
volume 25. Curran Associates, Inc., 2012.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning
for image recognition, 2015.

Footnotes
1 Dog Thinks Through A Problem: www.youtube.c om/watch?v =m_C rIu01SnM.

2 Reinforcement learning agent playing the CoastRunners game: www.youtube.c om/

watch?v =tlOIHko8ySg.
Another Random Document on
Scribd Without Any Related Topics
Consider, didst thou ever yet see the merits of Christ, and the
infinite satisfaction made by his death? Didst thou see this when the
burthen of sin and the wrath of God lay heavy on thy conscience?
That is grace. The greatness of Christ’s merit is not known but to a
poor soul at the greatest loss. *Slight convictions will but have slight
prizings of Christ’s blood and merits.

*10. Despairing sinner! Thou lookest on thy right hand and on

thy left, saying, Who will shew us any good? Look at Christ and be
saved, all ye ends of the earth. There is none else. He is a Saviour,
and there is none besides him. Look any where else, and thou art
undone. God will look at nothing but Christ, and thou must look at
nothing else. Christ is lifted up on high (as the brazen serpent in the
wilderness) that sinners at the ends of the earth, at the greatest
distance may see him. The least sight of him will be saving, the least
touch healing to thee; and God intends thou shouldst look on him,
for he hath set him on a high throne of glory, in the open view of all
poor sinners. Thou hast infinite reason to look on him. For he will
bear thy burdens; he will forgive, not only till seven times, but
seventy times seven. It put the faith of the apostle to it to believe
this, Luke xvii. 4, 5. because we are ♦hard to forgive, we think Christ
is hard.

♦ “heard” replaced with “hard”

11. Hear what he said, I have found a ransom. In him I am well
pleased. God will have nothing else; nothing else will do thee good,
or satisfy conscience but Christ, who satisfied the Father. God doth
all upon the account of Christ. Thy deserts are hell, wrath, rejection.
Christ’s deserts are life and pardon. He will not only shew thee the
one, but he will give thee the other. It is Christ’s own glory and
happiness to pardon. Consider, while Christ was upon earth, he was
more among Publicans and Sinners than among Scribes and
Pharisees: and he hath the same love now in heaven; he is God and
changeth not. He went through all temptations, sorrows, desertions;
and hath drank the bitterest of the cup, and left thee the sweet; the
condemnation is out. Christ drank up all the Father’s wrath at one
draught; and nothing but salvation is left for thee. Thou sayst thou
canst not believe, thou canst not repent: fitter for Christ, if thou hast
nothing but sin and misery. Go to Christ with all thy impenitency and
unbelief, to get faith and repentance; that is glorious. Tell Christ,
Lord, I have brought no righteousness, no grace to be justified by; I
am come for thine, and must have it. We would be bringing to
Christ, and that must not be; not a penny of nature’s highest
improvements will pass in heaven.
12. To say in compliment, I am a sinner, is easy; but to pray with
the Publican indeed, Lord, be merciful to me a sinner, is the hardest
prayer in the world. It is easy to say, I believe in Christ; but not to
see him full of grace and truth, of whole fullness thou mayst receive
grace for grace. It is easy to profess Christ with the mouth; but to
confess him with the heart, that is above flesh and blood. Many call
Christ Saviour; few know him so. To see grace and salvation in Christ
is the greatest sight in the world; none can do that, but at the same
time they shall see that glory and salvation are theirs. I may be
ashamed to think that to this day I have known so little of the blood
of Christ, which is the main thing of the gospel. A christless, formal
profession is the blackest sight next to hell. Thou mayst have many
good things, and yet one thing may be wanting, that may make thee
go away sorrowful from Christ. Thou hast never sold all thou hast,
never parted with all thine own righteousness. Thou mayst be high
in duty, and yet a perfect adversary to Christ. In every prayer, in
every ordinance, labour after sanctification to thy utmost; but make
not a Christ of it to save thee; if so, it must come down one way or
other. Christ’s infinite satisfaction, not thy sanctification, must be
justification before God. When the Lord shall appear terrible out of
his holy place, fire shall consume that as hay and stubble. This will
be sound religion, only to bottom all upon the everlasting mountains
of God’s love and grace in Christ, to live continually in the sight of
Christ’s infinite merits (they are sanctifying, without them the heart
is carnal) and in those sights to see the full vileness of sin, and to
see all pardoned; in those sights to pray and hear, seeing all thy
weak performances accepted continually, to trample upon all thy
own righteousness, and be found continually in the righteousness of
Christ only. Without the blood of Christ on the conscience, all is dead
service.
13. Search the scriptures daily, as mines of gold, wherein the
heart of Christ is laid. Watch against constitutional sins; see them in
their vileness, and they shall never break out into act. Keep always
an humble, empty, broken frame of heart, sensible of any spiritual
miscarriage, observant of all inward workings, fit for the highest
communications. Keep not guilt in the conscience, but apply the
blood of Christ immediately. God chargeth sin and guilt upon thee, to
make thee look to Christ the brazen serpent.

Judge not Christ’s love by providence, but by promises. Bless God

for any way whereby he keeps the soul awakened and looking after
Christ: better sickness and temptations, than security and slightness.

*A flighty spirit will turn a profane spirit, and will sin and pray
too. Slightness is the bane of profession. If it be not rooted out of
the heart by constant and serious dealings with, and beholdings of
Christ in duties, it will grow more strong and more deadly, by being
under church ordinances. Be serious and exact in duty, having the
weight of it upon thy heart; but be as much afraid of grounding thy
comfort on duties as on sins. Comfort from any hand but Christ is
deadly. Be much in prayer, or you will never keep up much
communion with God. As you are in closet prayer, so you will be in
all other ordinances.

14. Be true to truth, but not turbulent and scornful; restore such
as are fallen, with all the bowels of Christ. Set the broken disjointed
bones with the grace of the gospel! Despise not the weak; thou
mayst come to wish to be in the condition of the meanest of them.
Be faithful to others infirmities, but sensible of thine own. Visit sick
beds and deserted souls much; they are excellent scholars in
experience.
Abide in your calling. Be dutiful to all relations as to the Lord. Be
content with little of the world; little will serve. Think every little
much, because unworthy of the least. Think every one better than
thyself; loathing thyself as one fit to be trampled on by all saints.
See the vanity of the world and love nothing but Christ. Mourn to
see so little of Christ in the world. To a secure soul Christ is but a
fable, the scripture but a story. Mourn to think how many are under
church order that are not under grace. Prepare for the cross;
welcome it; bear it triumphantly like Christ’s cross, whether, scoffs,
mockings, contempt, imprisonments.—But see it be Christ’s cross,
not thine own.

15. Sin will hinder from glorying in the cross of Christ. And
omitting little things against light may breed hell in the conscience,
as well as committing the greatest sins. If thou hast been taken out
of the belly of hell into Christ’s bosom, and made sit among princes
in the houshold of God, Oh, how shouldest thou live as a pattern of
mercy!—Redeemed, restored soul, what infinite sums dost thou owe
Christ! With what zeal shouldst thou walk, and do every duty!
Sabbaths, what praising days should they be to thee!—Church
fellowship! What a heaven, a being with Christ, and angels, and
saints! What a drowning of the soul in eternal love, as a burial with
Christ, dying to all things besides him! Every time thou thinkest of
Christ be astonished; and when thou seest sin, look at Christ’s grace,
that did pardon it; and when thou art proud, look at Christ’s grace,
that shall strike thee down in the dust.
*Remember Christ’s time of love. When thou wast naked, then
he chose thee. Canst thou ever have a proud thought? Remember
whose arms supported thee from sinking, and delivered thee from
the lowest hell, and shout in the ears of angels and men, and for
ever sing praise, praise! Grace, grace! Daily repent and pray; and
walk in the sight of grace, as one that hath the anointings of grace
upon thee. Remember thy sins, Christ’s pardonings; thy deserts,
Christ’s merits; thy weakness, Christ’s strength; thy pride, Christ’s
humility; thy guilts, Christ’s new application of his blood; thy wants,
Christ’s fulness; thy temptations, Christ’s tenderness; thy vileness,
Christ’s righteousness.

16. Trifle not with ordinances. Be much in meditation and prayer.

Wait diligently upon all opportunities of hearing. We have need of
doctrine, reproof, exhortation, consolation, as the tender herb and
the grass hath of the rain, the dew, the small rain, and showers. Do
all thou dost as unto Christ, as immediately dealing with Christ
Jesus, as if he were looking on thee, and thou on him, and fetch all
thy strength from him.

*Observe what holy motions you find in your souls to duties;

prize the least good thought thou hast of Christ. The least good
word thou speakest of him from the heart, is rich mercy: O bless
God for it! Observe, if every day you have the Day-spring from on
high, with his morning dews of mourning for sin, constantly visiting
thee. Have you the bright morning star, with fresh influences of
grace and peace constantly arising, and Christ sweetly greeting the
soul in all duties? What duties make not more spiritual, will make
more carnal; what do not quicken and humble, will deaden and
harden.
17. Judas had a sop: but John leaned on Christ’s bosom; that’s
the posture in which we should pray, and hear, and perform all
duties. Nothing but lying in that bosom will dissolve all hardness of
heart, and make thee to mourn kindly for sin. That will humble
indeed, and make the soul cordial to Christ, and sin vile to the soul.
Never think thou art as thou shouldst be, until thou come to this,
always to see and feel thyself lying in the bosom of Christ, who is in
the bosom of his Father. Come and move the Father for a sight of
Christ, and you shall be sure to speed; you can come with no
request that pleaseth him better; he gave him out of his own bosom
for that very end, to be held up before the eyes of all sinners, as the
everlasting monument of his Father’s love.

Looking at the natural sun weakeneth the eye. The more you
look at Christ, the son of righteousness, the stronger and clearer will
the eye of faith be. Look but at Christ, you will love him, and live on
him. Think on him continually; keep the eye constantly upon Christ’s
blood, or every blast of temptation will shake you. If you will see
sin’s sinfulness, to loath it and mourn, do not stand looking on sin
only, but look upon Christ as suffering and satisfying. If you would
see your grace, your sanctification, do not stand gazing upon them,
but look at Christ’s righteousness first; look at your grace in the
second place.

18. *Have nothing to do with thy graces and sanctification till

thou hast seen Christ first. He that looks upon Christ through his
graces, is like one that sees the sun in water, which wavereth and
moveth as the water doth. Look upon Christ as shining in the
firmament of the Father’s love, and you will see him in his own glory.
Pride and unbelief will put you upon seeing somewhat in yourself
first; but faith will have to do with none but Christ, who must
swallow up thy sanctification as well as thy sin. He that sets up his
sanctification to look at first, he sets up the greatest idol, which will
strengthen his doubts and fears. Do but look off Christ, and
presently (like Peter) you sink in doubts.
If you would pray and cannot, and so are discouraged, see Christ
praying for you: if you are troubled, see Christ your peace leaving
you peace when he went up into heaven, again and again charging
you not to be troubled, so as to obstruct your comfort or your
believing. He is now upon the throne, having spoiled upon his cross
all whatsoever can hurt or annoy thee: he hath borne all thy sins,
sorrows, troubles, temptations, and is gone to prepare mansions for
thee.

19. *Thou who hast seen Christ all, and thyself absolutely
nothing, who makest Christ all thy life, and art dead to all
righteousness besides; do Christ this one favour for all his love to
thee, love all his poor saints, (the meanest, the weakest,
notwithstanding any difference in judgment) they are engraven on
his heart, let them be so on thine. Pray for the peace of Jerusalem,
they shall prosper that love thee. Psalms cxxii. 6.
CHRISTIAN LETTERS.
By Mr. JOSEPH ALLEINE.
To the READER.

T HE letters of Mr. Samuel Rutherford, have been generally

admired by all the children of God, into whose hands they
have fallen, for the vein of piety, trust in God, and holy zeal, which
runs through them. The same piety, zeal, and confidence in God,
shine through all the letters of Mr. Alleine: so that in this respect he
may well be stiled, the English Rutherford. But yet there is a very
discernible difference between them: in piety and fervour of spirit
they are the same: but the fervour of the one more resembles that
of St. Paul, of the other, that of St. John. They were both men of the
most intrepid courage: but in love Mr. Alleine has the pre-eminence.
He seems to excel in bowels of mercies, meekness, gentleness, in
tenderness, mildness, and sweetness of spirit, even to his bitterest
enemies. I do not therefore scruple to give these letters the
preference, even to Mr. Rutherford’s: as expressing, in a still higher
degree, the love that is long-suffering and kind, which is not
provoked, which thinketh no evil, and which hopeth, believeth, and
endureth all things.

JOHN WESLEY.

London,
March 7, 1767.
L E T T E R I.
To his wife concerning his acceptance of Taunton.

My dear heart,

B Y this time I hope thou hast received mine, touching Taunton. I

find my heart much inclining that way. I will tell thee the
principles upon which I go.

First, I lay this for a foundation, That a man’s life consists not in
the abundance of the things that he possesseth. It was accounted a
wise prayer that Agur put up, to be fed with food convenient for him.
And certain it is, that where men have least of the world, they
esteem it least, and live more by faith in God, casting their care and
burden upon him. O the sweet breathing of David’s soul! The strong
actings of his faith when his condition was low and mean! How fully
doth he rely upon God. And certainly could we that are
unexperienced, but feel the thorns of those cares and troubles, that
there are in gathering and keeping much, and the danger when
riches increase of setting our hearts upon them, we should prize the
happiness of a middle condition. Doubtless, godliness with
contentment is great gain. Seekest thou great things for thyself (said
the prophet to Baruch) seek them not. Certainly a good conscience
is a continual feast, and enough for a happy life.
*Secondly, I take this for an undoubted truth, that a dram of
grace is better than a talent of wealth; and therefore such a place
where we have little to do with the world to take off our thoughts
from the things of eternity, and have the advantage of abundance of
means, and the daily opportunities of warming our hearts with the
blessed society of Christians, is (if we pass a true spiritual judgment)
without comparison before another place, void of those spiritual
advantages. Let us think, what though our purses may thrive better
in a place of large maintenance; yet where are our souls like to
thrive any way answerable to what they are in this? We should have
but little in the world, but what is this, if it be made up to us, in
communion with God and his people? If we thrive in faith, and love,
humility, and heavenly-mindedness, what matter is it, though we do
not raise ourselves in the world? Oh! Who would leave so much
grace, and so much comfort in communion with Christ and his
saints, for the probabilities of living a little more handsomely. ’Tis a
strange thing to see how Christians generally judge. What is it worth
a year? Is the maintenance certain? What charges are there like to
be? These are the questions we commonly ask first, when we speak
of settling. But alas, though those things are to be considered too,
yet what good am I like to do? What good am I like to get? These
should be the chief things we should judge by. What if we have but
a little in the world? Why then we must keep but a short table, and
give the meaner entertainment to our friends. O, but will not this be
abundantly made up, if we have more outward and inward peace?
Let others hug themselves in their corn, and wine, and oil, in their
fat livings, and their large tables, if we have more of the light of
God’s countenance, who would change with them?
Thirdly, That the surest way to have any outward mercy, is to be
content to want it. When men’s desires are over-eager after the
world, they must have thus much a year, and a house well furnished,
and wife, and children, thus and thus qualified, God doth usually,
break their wills by denying them, as one would cross a froward
child of his stubborn humour: or else puts a sting into them, that a
man had been as well without them. The best way to get riches, is
out of doubt to set them lowest in one’s desires. Solomon found it
so: he did not ask riches, but wisdom, but God was so pleased, that
he threw in them into the bargain. Nothing sets God’s mercies
farther off than the want of free submission to want them. Certainly,
God will never be behind hand with us. Let our care be to build his
house, and let him alone to build ours.

*Fourthly, That none ever was, or ever shall be, a loser by Jesus
Christ. Many have lost much for him, but never did, never shall any
lose by him. Take this for a certainty, whatsoever outward comforts
we leave, or outward advantages, that we may glorify him in our
services, and enjoy him in all his ordinances more than other-where
we could, we shall receive an hundred fold in this life. ’Tis a sad
thing to see how little Christ is trusted; men will trust him no farther
than they can see him. Alas, hath he not a thousand ways, both
outward and inward, to make up a little outward advantage to us?
Have any ventured themselves upon him in his way, but he made
good every promise to them? Let us therefore exercise our faith, and
stay ourselves on the promise, and see if ever we are ashamed of
our hope.
Fifthly, That what is wanting in the means, God will make up in
the blessing. This I take for a certain truth, while a man commits
himself and his affairs to God, and is in any way that God put him
into; and if a man have but a little income, if he have a great
blessing, that will make it up. Alas, we must not account of mercies
by the bulk. What if another have a pound to my ounce, if mine be
gold for his silver, I will never change with him. As ’tis not bread that
keeps men alive, but the word of blessing that proceedeth out of the
mouth of God; so ’tis not the largeness of the means, but the
blessing of the Lord that maketh rich. Oh! If men did but believe this
they would not grasp so much of the world as they do. Well, let
others take their course, and we will take ours, to wait upon God by
faith and prayer: let others toil to enlarge their income, we will pray
God to enlarge our blessing, and I doubt not but we shall prove the
gainers.

Sixthly, That every condition hath its snares, and troubles, and
therefore we may not expect to be without them wherever we be;
only that condition is most eligible that hath fewest and least. I
cannot object any thing against the proposal of Taunton, but the
meanness of the maintenance. And let us consider how
inconsiderable this inconvenience is, in comparison of those we must
reckon upon meeting with, if God cast us into another place. Upon
these considerations, I find my heart much inclined to accept of their
offer at Taunton. I beseech thee to weigh the matter and tell me thy
thoughts, and which way thy spirit inclines, for I have always
resolved the place I settled in should be to thy content. I have been
so large in delivering my judgment, that I must thrust up my
affections into a corner. Well, though they have but a corner in my
letter, I am sure they have room enough in my heart: but I must
conclude; the Lord keep thee my dear, and cherish thee for ever in
his bosom; farewell, mine own soul.

I am, as ever, thine own heart,

JOS. ALLEINE.
Oxon, May 27, 1659.

L E T T E R II.
To my most dearly beloved, my Christian friends in Taunton,
salvation.

Most loving brethren,

I SHALL never forget your old kindnesses; would I never so fain
forget them, yet I could not, they are so continually renewed;
for there is never a day but I hear of them: nay, more than hear of
them, I feel and taste them. The God that hath promised they that
give to a prophet but a cup of cold water, shall receive a prophet’s
reward: he will recompense your labour of love, your fervent
prayers, your care for my welfare, and your bountiful supplies. I do
and will bless the Lord as long as I live, that he hath cast my lot in
so fair a place, to dwell in your communion; and especially to go in
and out before you, and to be the messenger of the Lord of hosts to
you, to proclaim his law, and to preach his excellencies, to be his
spokesman to you, and to woo for him, to espouse you to one
husband, and to present you as a chaste virgin unto Christ. Lord!
how unworthy am I, of this glorious dignity, which I verily believe
the brightest angels in heaven would be glad of! I cannot repent,
notwithstanding all the difficulties that attend his despised servants,
and that are like to attend them; I have set my hand to his plough;
and when I was entered into the sacred office, I told you, “Most
gladly do I take up this office with all the persecution, affliction,
difficulties and inconveniences that do and may attend it.” And
blessed be God, I am through his goodness of the same mind still;
and my tribulations for Christ, confirm my choice and resolution to
serve him with much more than my labours. *Brethren, let them
take up with the world that have no better portion; be content that
they should bear away the riches, and preferments, and glory, and
splendor of the world. Alas! You have no reason to envy them: verily
they have a lie in their right-hand: Ah! How soon will their hopes fail
them! How soon will the crackling blast be out, and leave them in
eternal darkness! They shall go to the generation of their fathers,
they shall never see light; like sheep they shall be laid in their
graves, and the upright shall have dominion over them in the
morning. But for you, my brethren, I am jealous that none of you
should come short of the glory of God. I am ambitious for you that
you should be all the heirs of an endless life, of the inheritance
incorruptible, undefiled, and that fadeth not away.
Ah my brethren! Why should you not be all happy? I am jealous
for you with a godly jealousy, lest a promise being left you of
entering into his rest, any of you should come short of it. O look
diligently, lest any man fail of the grace of God! How it grieves me
that any of you should fall short of mercy at last! That any of that
flock over which the Holy Ghost made me overseer, should perish:
when Christ hath done so much for you, and when we, (through his
grace,) have done somewhat to save them. Ah dear brethren! I was
in great earnest with you, when I besought you out of the pulpit,
many a time, to give a bill of divorce to your sins, to accept of the
mercy that in the name of God Almighty I did there offer to you.
Alas! how it pitied me to look over so great a congregation, and to
think that I could not for my life, persuade them, one quarter of
them, to be saved? How it moved me to see your diligence in
flocking to the most hazardous opportunities, since the law forbad
my public preaching; and yet to think that many of you that went so
far, were like to perish for ever for want of going farther! How fain
would I carry you farther than the outward profession: O, how loath
am I to leave you there? How troubled to think that any of you
should hazard much for religion, and yet miscarry for ever by the
hand of secret pride or untamed passion, or an unbridled tongue, or
which I fear most of all, a predominant love of the world in your
hearts. Alas, is there no remedy, but I must carry you to heaven’s
gate and leave you there? Oh, that I should leave the work of your
souls but half done; and bring you no farther than the almost of
Christianity! Hear, O my people, hear! Altho’ I may command you,
upon your utmost peril, in the name of the Lord Jesus that shall
shortly judge you, I beseech you, I warn you as a father doth his
children: to look to the securing of your everlasting condition: take
heed of resting in the outer part of religion, but be restless till you
find a thorough change within, that you are quite new in the bent of
your hearts; for here is the main of religion: for Christ’s sake, for
your soul’s sake look to it, that you build upon the rock, that you
unfeignedly deliver yourselves to the Lord to be under his command,
and at his disposal in all things. See that you make no exceptions,
no reserve, that you cast over-board all your worldly hopes, and
count upon parting with all for Christ: that you take him alone for
your whole happiness. Wonder not that I often inculcate this: if it be
well here, it is well all; if unsound here, the error is in the
foundation, and you are undone. Brethren, I see great trials coming,
when we shall see professors fall like leaves in autumn. Therefore is
it that I would so fain have you look to your standing, and to secure
the main. O make sure whatever you do; get and keep your
evidences clear! How dreadful would your temptation be, if you
should be called to part with all for Christ, and not be sure of him
neither! Get a clear understanding of the terms of life, which I have
set before you in that form of covenanting with God in Christ, that I
commended to you. I would that none of you should be without a
copy of it: be much in observing your own hearts, and crying
mightily to God for assurance: be strict and watchful in your whole
course, and I doubt not but you will quickly have it.

I cannot conclude till I have given you my unfeigned thanks for

your most kind and gracious letter. Sure it shall be in store with me,
and laid up among my treasures. That God is pleased to make use of
me for your edification, is matter of highest joy unto me: as also to
see your stedfastness in Christ, your unshaken resolutions,
notwithstanding all the tempter’s wiles. Go on, my dearly beloved,
and the Lord strengthen your hands and your hearts, and lift you up
above the fear of men. The Lord strengthen, establish, settle you
and after you have suffered awhile, make you perfect: I leave my
brethren in the everlasting arms, and rest,

Your embassador in bonds,

JOS. ALLEINE.

From the common ♦goal at Ivelchester,

June 13th, 1663.

♦ “goal” is an old English form of the word “gaol”

L E T T E R III.
To the beloved people, the inhabitants of the town of
Taunton, grace, mercy, and peace, from God our Father,
and from the Lord Jesus Christ.

Most endeared and beloved friends:

I READILY acknowledge myself a debtor to you all, and a servant of
all, and therefore I have sent to salute you all. My lines fell in a
fair place when the Lord cast my lot among you; I remember the
tears and prayers you have sent me hither with. How can I forget
how you poured out your souls upon me; and truly you are a people
much upon my heart, whose welfare is the matter of my continual
prayers, care, and study. And oh! that I knew how to do you good:
Ah! how certainly should never a son of you miscarry. Ah! how it
pities me to think, that so many of you should still remain in your
sins. Once more, oh! my beloved, once more hear the call of the
most high God unto you. The prison preaches to you the same
doctrine as the pulpit did. Hear, O people, hear; he that hath an ear
let him hear. The Lord of life and of glory offers you all mercy, and
peace, and blessedness; Oh! why should we die? Whosoever will, let
him take of the waters of life freely! What miss of life, when it is to
be had for the taking! God forbid. O my brethren, my soul yearns for
you, and my bowels towards you. Ah, that I did but know what
arguments to use with you: who shall chuse my words that I may
prevail with sinners not to reject their own mercy? How shall I get
within them? How shall I reach them? O that I could get but
between their sins and them. Beloved brethren, the Lord Jesus hath
made me, most unworthy, his spokesman, to bespeak your hearts
for him: and oh, that I knew but how to woo for him, that I might
prevail! These eight years have I been calling, and yet how great a
part remain visibly in their sins, and how few have I gained to Christ
by sound conversion?
Many among you remain under the power of ignorance: ah: how
often have I told you the dangerous, yea, damnable state that such
are in. Never flatter yourselves that you shall be saved though you
go on in this; I have told you often, and now tell you again, God
must be false, if ever you be saved without being brought out of the
state of ignorance: if ever you enter in at the door of heaven, it
must be by the key of knowledge, you cannot be saved, except you
be brought to the knowledge of the truth. A people that remain in
gross ignorance, that are without understanding, the Lord that hath
made them, will not have mercy on them. Oh, for the love of God
and your souls, I beseech you awake and bestir yourselves to get
the saving knowledge of God. You that are capable of learning a
trade, are you not capable of learning the way to be saved? And is it
not pity that you should perish for ever for want of a little pains, and
study, and care to get the knowledge of God? Study the catechism;
if possible, get it by heart; if not, read it often, or get it read to you;
cry unto God for knowledge; improve the little you have by living
answerably. Search the scriptures daily, get them read to you if you
cannot read them. Improve your sabbaths diligently, and I doubt not
but in the use of these means, you will sooner arrive to the
knowledge of Christ than of a trade.

Many have escaped the gross pollutions of the world, but stick in
the form of godliness. O I am jealous for you! That you may not lose
the things that you have wrought, for the Lord’s sake put on, and
beware of perishing in the suburbs of the city of refuge. Beg of God
to make thorough work with you; be jealous for yourselves and try
your estates, but only with those marks, that you are sure will abide
God’s trial.
But for you that fear the Lord in sincerity, I have nothing but
good and comfortable words: may your souls ever live. What
condition can you devise wherein there will not be matter of joy
unspeakable to you. O beloved, know your own happiness, and live
in that holy admiring, adoring, praising of your gracious God, that
becomes the people of his praise. The good will of him that dwelt in
the bush with you all. The Lord create a defence upon you, and
deliverance for you: The Lord cover you all the day, and make you to
dwell between his shoulders! I desire your constant, instant, earnest
prayers for me, and rest,

A willing labourer, and thankful sufferer for you.

JOS. ALLEINE.

From the common goal, in Ivelchester,

July 4th, 1663.
L E T T E R IV.
To my most endeared friends, the servants of Christ in
Taunton, grace and peace.

Most dearly beloved and longed for,

my joy and crown:
M Y heart’s desire and prayer to God for you is, that you may be
saved. I know that you are the butt of men’s rage and
malice: but you may satisfy yourselves as David in sustaining
Shimei’s curses: it may be the Lord will look upon your affliction, and
requite good for their cursing this day. But however that be, hold on
your way. Your name indeed is cast out as evil, and you are hated of
all men for Christ’s sake, for cleaving to his ways and servants: but
let not this discourage you, for you are now more than ever blessed:
only hold fast, that no man take your crown. Let not any that have
begun in the spirit end in the flesh. Do not forsake God, till he
forsake you; he that endureth to the end shall be saved: the promise
is to him that overcometh; therefore think not of looking back: now
you have set your hands to Christ’s plough, though you labour and
suffer, the crop will pay for all: now the Lord is trying who they be
that will trust him. The world are all for a present pay; they must
have something in hand, and will not follow the Lord when there are
hazard and hardship in his service. But now is the time for you, to
prove yourselves believers, when there is nothing visible but hazard
and expence, and difficulty in your Maker’s service. Now, my
brethren, stand fast in faith, quit you like men, be strong: if you can
trust in his promises now, when nothing appears but bonds, and
losses, and tribulation, this will be like believers. *Brethren, I
beseech you to reckon upon no other but crosses here. Let none of
you flatter yourselves with dreams of sleeping in your ease, and
temporal prosperity, and carrying heaven too. Count not upon rest
till you come to the land of promise. Not that I would have any of
you to run upon hazards uncalled: no, we shall meet them soon
enough in the way of our duty, without we turn aside: but I would
have you cast over-board your worldly hopes, and be content to wait
till you come on the other side the grave. Is it not enough to have a
whole eternity of happiness? If God throws in the comforts of this
life too, I would not have you throw them back again, or despise the
goodness of the Lord: but I would, that you should use this world,
as not abusing it, that you should be crucified to the world, and the
world to you; that you should declare plainly that you seek a better
country, which is an heavenly. Ah! my dear brethren, I beseech you
carry it like pilgrims and strangers; abstain from fleshly lusts which
war against your souls; for what have we to do with the customs
and fashions of this world, who are strangers in it? Be contented
with travellers lots: know you not that you are in a strange land? All
is well as long as it is well at home; I pray you brethren, daily
consider your condition: Do you not remember that you are at an
inn? And what though you be but meanly accommodated? Though
you fare hard, and lie hard! Is this a strange thing? What should
travellers look for else? Indeed if you were of the world, the world
would love his own. But now God hath called you out of the world;
therefore the world hateth you. But remember, it is your duty to love
them, even while they hate you; and to pray for mercy for them,
that will shew no mercy, or justice. This I desire you to observe as a
great duty of the present times: and let not any so forget their duty,
as to wish evil to them that do evil to us, or to please themselves
with the thoughts of being even with them. Let us commit ourselves
to him that judgeth righteously, and shew ourselves the children of
the Most High, who doth good to his enemies, and is kind to the
unkind and unthankful: And what though they do hate us? Their love
and goodwill were much more to be feared, than their hatred.
Brethren, keep yourselves in the love of God; here is wisdom. O
happy souls, that are his favourites! For the Lord’s sake look to this,
make sure of something: look to your sincerity above all things: let
not any of you conclude, that because you are of the suffering party,
therefore all is well: look to the foundation, that your hearts be
taken off from every sin, and set upon God as your blessedness:
beware that none of you have only a name to live, and be no more
than almost Christians. For the love of your souls, make a diligent
search, and try upon what ground you stand; for it pities me to think
any of you should hazard so much, and yet lose all at last. But when
once you bear the mark of God’s favour, you need not fear the
world’s frowns: chear up therefore, brethren, be strong in the Lord,
and of good courage under the world’s usage: fear not, in our
Father’s house there is bread enough, and room enough: this is
sufficient to comfort us under all the inconveniences of the way, that
we have so happy a home, so ready a Father, so goodly an heritage.
Oh, comfort one another with these words: let God see that you can
trust in his word: let the world see that you can live upon God. I
shall share my prayers and loves among you all, and commit you to
the almighty God: the keeper of Israel that never slumbereth nor
sleepeth, be your watchman and keeper to the end, farewell. I am,

A fervent well-wisher of your temporal and eternal affairs,

JOS. ALLEINE.

From the common goal at Ivelchester,

July 24, 1663.

L E T T E R V.
To my most dearly beloved friends, in Taunton, grace and
peace.

Most endeared Christians.

M Y heart is with you, though I am absent. Dear fellow soldiers
under the captain of our salvation, consider your calling, and
approve yourselves men of resolution; be discouraged with no
difficulties of your present warfare. As for human affairs, I would
have you to be as you are, men of peace. I would have you armed,
not for resisting, God forbid; but for suffering only. You should resist,
to the uttermost, striving against sin. Here you must give no quarter!
For if you spare but one Agag, the life of your souls must go for the
life of your sin. God will not smile on that soul that smiles on sin, nor
have any peace with him, that is at peace with his enemy. Other
enemies you must forgive, and love, and pray for; but for these
spiritual enemies, all your affections, and all your prayers must be
engaged against them: yea, you must admit no parley: it is
dangerous to dispute with temptations. *Remember what Eve lost
by parleying with Satan: you must fly from temptations, and put
them off at first with a peremptory denial. If you will but hear the
devil’s arguments, and the flesh’s pleas, it is an hundred to one but
you are insnared. And for this present evil world, the Lord deliver
you from it. Surely you had need watch and be sober, or else this
world is like to undo you. I have often warned you not to build upon
an external happiness; and to promise yourselves nothing but
hardship here: Oh still remember your station; soldiers must not
count upon rest, and fullness, but hunger, and hardness. Labour to
get right apprehensions of the world. Do not think these things
necessary. One thing is needful: you may be happy in the want of all
outward comforts. Do not think yourselves undone, if brought to
want or poverty: study eternity, and you will see it to be little
material to you, whether you are poor or rich: you may have never
such an opportunity for your advantage in all your lives, as when
you seem to run the vessel upon the rocks. Set your enemies one
against the other; death against the world; no such way to get
above the world, as to put yourselves into the possession of death.
*Look often upon the dust that you shall be reduced to, and
imagine you saw your bones tumbled out of your graves, as they are
like shortly to be, and men handling your skulls, and enquiring
whose is this? Tell me, of what account will the world be then? Put
yourselves often into your graves, and look out from thence upon
the world, and see what judgment you have of it. Must not you
shortly be forgot among the dead? Your places will know you no
more, and your memory will be no more among men, and then what
will it profit you to have lived in fashion and repute? One serious
walk over a church-yard, might make a man mortified to the world.
Think upon how many you tread; but ye know them not: no doubt
they had their estates, their friends, their businesses, and kept as
much stir in the world as others do now. But alas, what are they the
better for all this? Know you not that this must be your own case
shortly? Oh the unhappiness of man; how is he bewitched; and
befooled, that he should expend himself for that which he knows
shall forever leave him! Brethren, I beseech you lay no stress upon
these perishing things, but labour to be at a holy indifference about
them: is it for one that is in his wits to sell his God, his soul, for
things he is not sure to keep a day, and which he is sure after a few
sleepings and wakings more, to leave behind him for ever? Go, and
talk with dying men, and see what apprehensions they have of the
world? If any should come to these, and tell them here is such and
such preferments for you; you shall have such titles of honour and
delights, if you will now disown religion, do you think such a motion
would be embraced? Brethren, why should we not be wise in time!
Why should we not now be of the mind, of which we know we shall
be all shortly? Woe to them that will not be wise, till it be to no
purpose! Woe to them whose eyes nothing but death and judgment
will open! Woe to them that though they have been warned by
others, and have heard the world’s greatest darlings in death cry out
of its vanity, yet would take no warning; but must serve themselves
too, for warnings to others. Ah! my beloved, beware there be none
among you, that will rather part with their consciences than with
their estates; that have secret reserves to save themselves whole,
when it comes to the pinch; and not to be of the religion that will
undo them in the world. Beware that none of you have your hearts
where your feet should be, and love your mammon before your
Maker.

May the Lord of Hosts be with you, and the God of Jacob your
refuge. Farewell my dear brethren, farewell, and be strong in the
Lord. I am

Your’s to serve you in the gospel,

whether by doing or suffering

JOS. ALLEINE.

From the common gaol at Ivelchester,

August 31, 1663.

L E T T E R VI.
To the beloved friends, the flock of Christ in Taunton,
salvation.

Most dearly beloved and longed for, my joy and crown:

I MUST say of you as David did of Jonathan, Very pleasant have
you been unto me, and your love to me is wonderful. And as I
have formerly taken great content in that my lot was cast among
you, so I rejoice in my present lot, that I am called to approve my
love to you by suffering for you; for you, I say; for you know I have
not sought your’s but you; and that for doing my duty to your souls,
I am here in these bonds, which I chearfully accept through the
grace of God that strengtheneth me: Oh! that your hands might be
strengthened, and your hearts encouraged in the Lord your God by
our sufferings! See to it, that you stand fast in the power of the holy
doctrine, which we have preached from the pulpit, preached at the
bar; preached from the prison: it is a gospel worth suffering for: see
that you follow after holiness, without which no man shall see God!
Oh the madness of the blind world, that they should put from them
the only plank upon which they can escape to heaven! Alas for
them! They know not what they do. What would not these foolish
virgins do, when it is too late, for a little of the oil of the wise! But
let not any of you be wise too late: look diligently, lest any man fail
of the grace of God. Beware that none of you be cheated through
the deceitfulness of your hearts, with counterfeit grace. There is
never a grace but hath its counterfeit; and there is nothing more
common, than to mistake counterfeit grace for true. And remember
you are undone for ever, if you should die in such a mistake. Not
that I would shake the confidence of any sound believer, whose
graces are of the right kind: build your confidence sure: see that you
get the certain marks of salvation, and make sure, by observing your
own hearts, that these marks be in you, and then you cannot be too
confident. But as you love your souls, take heed of a groundless
confidence. Take heed of being confident before you have tried. I
would fain have you all secured against the day of judgment; I
would that the state of your souls were all well settled: Oh how
comfortably might you think of any troubles, if you were but sure of
your pardons! I beseech you, whatever you neglect, look to this: I
am afraid there are among you that have not made your peace with
God; that are not yet acquainted with that great work of conversion:
such I charge before the living God, to speed to Christ, and without
any more delay to put away their iniquities, and deliver up
themselves to Jesus Christ, that they may be saved. It is not your
profession or external duties, that will save you: no, no, you must be
converted or condemned. It is not enough that you have some love
to God’s ways and people, and are willing to venture something for
them; all this will not prove you sound Christians: Have your hearts
been changed? Have you been soundly convinced of your sins? Of
your damnable and undone condition? And your utter inability to lick
yourselves whole by your own duties? Have you been brought to
such a sense of sin, that there is no sin, but you heartily abhor it?
Are you brought to such a sense of the beauty of holiness, and of
the laws and ways of God, that you desire to know the whole mind
of God? Would you not excuse yourselves by ignorance from any
duty, and do not you allow yourselves in the neglect of any thing
conscience charges upon you as a duty? Are your very hearts set
upon the glorifying and enjoying of God, as your greatest happiness?
*Had you rather be the holiest than be the richest and greatest in
the world? And is your greatest delight in the thoughts of your God,
and in your conversings with God in holy exercises! Is Christ more
precious than all the world to you? And are you willing upon the
thorough consideration of the strictness and holiness of his laws, to
take them all for the rule of your thoughts, words and actions, and
though religion may be dear, do you resolve, if God assist you, to go
through with it, let the cost be what it will? Happy the man that is in
such a case. This is a Christian indeed, and whatever you be and do
short of this, all is unsound. But you that bear in your souls the
marks above-mentioned, upon you I lay no other burden, but to hold
fast, and make good your ground, and to press forwards towards the
mark. Thankfully acknowledge the grace of God to your souls; and
live rejoicingly in the hopes of the glory of God; live daily in the
praises of your Redeemer; and study the worthiness, excellency, and
glory of his attributes: let your souls be much taken up in
contemplating his glorious perfection, and blessing yourselves in the
goodly portion you have in him: live like those that have a God, and
then be disconsolate if you can: if there be not more in an infinite
God to comfort you, than in a prison, or poverty or affliction to