Instant download The Art of Reinforcement Learning: Fundamentals, Mathematics, and Implementations with Python 1st Edition Michael Hu pdf all chapter
Instant download The Art of Reinforcement Learning: Fundamentals, Mathematics, and Implementations with Python 1st Edition Michael Hu pdf all chapter
com
https://ebookmeta.com/product/the-art-of-reinforcement-
learning-fundamentals-mathematics-and-implementations-with-
python-1st-edition-michael-hu-2/
OR CLICK BUTTON
DOWLOAD NOW
https://ebookmeta.com/product/the-art-of-reinforcement-learning-
fundamentals-mathematics-and-implementations-with-python-1st-
edition-michael-hu/
https://ebookmeta.com/product/python-ai-programming-navigating-
fundamentals-of-ml-deep-learning-nlp-and-reinforcement-learning-
in-practice-patrick-j/
https://ebookmeta.com/product/deep-reinforcement-learning-with-
python-with-pytorch-tensorflow-and-openai-gym-1st-edition-nimish-
sanghi-3/
https://ebookmeta.com/product/deep-reinforcement-learning-with-
python-with-pytorch-tensorflow-and-openai-gym-1st-edition-nimish-
sanghi/
Deep Reinforcement Learning with Python With PyTorch
TensorFlow and OpenAI Gym 1st Edition Nimish Sanghi
https://ebookmeta.com/product/deep-reinforcement-learning-with-
python-with-pytorch-tensorflow-and-openai-gym-1st-edition-nimish-
sanghi-2/
https://ebookmeta.com/product/primary-mathematics-3a-hoerst/
https://ebookmeta.com/product/cambridge-igcse-and-o-level-
history-workbook-2c-depth-study-the-united-states-1919-41-2nd-
edition-benjamin-harrison/
https://ebookmeta.com/product/mastering-python-forensics-master-
the-art-of-digital-forensics-and-analysis-with-python-1st-
edition-michael-spreitzenbarth-johann-uhrmann/
https://ebookmeta.com/product/reinforcement-learning-optimal-
feedback-control-with-industrial-applications-jinna-li/
Michael Hu
© Michael Hu 2023
Apress Standard
The publisher, the authors, and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, expressed or implied, with respect to the
material contained herein or for any errors or omissions that may have
been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
Source Code
You can download the source code used in this book from github.com/
apress/art-of-reinforcement-lear ning.
Michael Hu
Any source code or other supplementary material referenced by the
author in this book is available to readers on GitHub (https://github.
com/Apress). For more detailed information, please visit https://www.
apress.com/gp/services/source-code.
Contents
Part I Foundation
1 Introduction
1.1 AI Breakthrough in Games
1.2 What Is Reinforcement Learning
1.3 Agent-Environment in Reinforcement Learning
1.4 Examples of Reinforcement Learning
1.5 Common Terms in Reinforcement Learning
1.6 Why Study Reinforcement Learning
1.7 The Challenges in Reinforcement Learning
1.8 Summary
References
2 Markov Decision Processes
2.1 Overview of MDP
2.2 Model Reinforcement Learning Problem Using MDP
2.3 Markov Process or Markov Chain
2.4 Markov Reward Process
2.5 Markov Decision Process
2.6 Alternative Bellman Equations for Value Functions
2.7 Optimal Policy and Optimal Value Functions
2.8 Summary
References
3 Dynamic Programming
3.1 Use DP to Solve MRP Problem
3.2 Policy Evaluation
3.3 Policy Improvement
3.4 Policy Iteration
3.5 General Policy Iteration
3.6 Value Iteration
3.7 Summary
References
4 Monte Carlo Methods
4.1 Monte Carlo Policy Evaluation
4.2 Incremental Update
4.3 Exploration vs.Exploitation
4.4 Monte Carlo Control (Policy Improvement)
4.5 Summary
References
5 Temporal Difference Learning
5.1 Temporal Difference Learning
5.2 Temporal Difference Policy Evaluation
5.3 Simplified 𝜖-Greedy Policy for Exploration
5.4 TD Control—SARSA
5.5 On-Policy vs.Off-Policy
5.6 Q-Learning
5.7 Double Q-Learning
5.8 N-Step Bootstrapping
5.9 Summary
References
Part II Value Function Approximation
6 Linear Value Function Approximation
6.1 The Challenge of Large-Scale MDPs
6.2 Value Function Approximation
6.3 Stochastic Gradient Descent
6.4 Linear Value Function Approximation
6.5 Summary
References
7 Nonlinear Value Function Approximation
7.1 Neural Networks
7.2 Training Neural Networks
7.3 Policy Evaluation with Neural Networks
7.4 Naive Deep Q-Learning
7.5 Deep Q-Learning with Experience Replay and Target
Network
7.6 DQN for Atari Games
7.7 Summary
References
8 Improvements to DQN
8.1 DQN with Double Q-Learning
8.2 Prioritized Experience Replay
8.3 Advantage function and Dueling Network Architecture
8.4 Summary
References
Part III Policy Approximation
9 Policy Gradient Methods
9.1 Policy-Based Methods
9.2 Policy Gradient
9.3 REINFORCE
9.4 REINFORCE with Baseline
9.5 Actor-Critic
9.6 Using Entropy to Encourage Exploration
9.7 Summary
References
10 Problems with Continuous Action Space
10.1 The Challenges of Problems with Continuous Action Space
10.2 MuJoCo Environments
10.3 Policy Gradient for Problems with Continuous Action
Space
10.4 Summary
References
11 Advanced Policy Gradient Methods
11.1 Problems with the Standard Policy Gradient Methods
11.2 Policy Performance Bounds
11.3 Proximal Policy Optimization
11.4 Summary
References
Part IV Advanced Topics
12 Distributed Reinforcement Learning
12.1 Why Use Distributed Reinforcement Learning
12.2 General Distributed Reinforcement Learning Architecture
12.3 Data Parallelism for Distributed Reinforcement Learning
12.4 Summary
References
13 Curiosity-Driven Exploration
13.1 Hard-to-Explore Problems vs.Sparse Reward Problems
13.2 Curiosity-Driven Exploration
13.3 Random Network Distillation
13.4 Summary
References
14 Planning with a Model:AlphaZero
14.1 Why We Need to Plan in Reinforcement Learning
14.2 Monte Carlo Tree Search
14.3 AlphaZero
14.4 Training AlphaZero on a 9 × 9 Go Board
14.5 Training AlphaZero on a 13 × 13 Gomoku Board
14.6 Summary
References
Index
About the Author
Michael Hu
is an exceptional software engineer with a wealth of
expertise spanning over a decade, specializing in the
design and implementation of enterprise-level
applications. His current focus revolves around leveraging
the power of machine learning (ML) and artificial
intelligence (AI) to revolutionize operational systems
within enterprises. A true coding enthusiast, Michael finds
solace in the realms of mathematics and continuously
explores cutting-edge technologies, particularly machine learning and
deep learning. His unwavering passion lies in the realm of deep
reinforcement learning, where he constantly seeks to push the
boundaries of knowledge. Demonstrating his commitment to the field,
he has built various numerous open source projects on GitHub that
closely emulate state-of-the-art reinforcement learning algorithms
pioneered by DeepMind, including notable examples like AlphaZero,
MuZero, and Agent57. Through these projects, Michael demonstrates
his commitment to advancing the field and sharing his knowledge with
fellow enthusiasts. He currently resides in the city of Shanghai, China.
About the Technical Reviewer
Shovon Sengupta
has over 14 years of expertise and a deepened
understanding of advanced predictive analytics, machine
learning, deep learning, and reinforcement learning. He
has established a place for himself by creating innovative
financial solutions that have won numerous awards. He is
currently working for one of the leading multinational
financial services corporations in the United States as the
Principal Data Scientist at the AI Center of Excellence. His job entails
leading innovative initiatives that rely on artificial intelligence to
address challenging business problems. He has a US patent (United
States Patent: Sengupta et al.: Automated Predictive Call Routing Using
Reinforcement Learning [US 10,356,244 B1]) to his credit. He is also a
Ph.D. scholar at BITS Pilani. He has reviewed quite a few popular titles
from leading publishers like Packt and Apress and has also authored a
few courses for Packt and CodeRed (EC-Council) in the realm of
machine learning. Apart from that, he has presented at various
international conferences on machine learning, time series forecasting,
and building trustworthy AI. His primary research is concentrated on
deep reinforcement learning, deep learning, natural language
processing (NLP), knowledge graph, causality analysis, and time series
analysis. For more details about Shovon’s work, please check out his
LinkedIn page: www.linkedin.com/in/shovon-sengupta-272aa917.
Part I
Foundation
© The Author(s), under exclusive license to APress Media, LLC, part of Springer
Nature 2023
M. Hu, The Art of Reinforcement Learning
https://doi.org/10.1007/978-1-4842-9606-6_1
1. Introduction
Michael Hu1
(1) Shanghai, Shanghai, China
Fig. 1.1 A DQN agent learning to play Atari’s Breakout. The goal of the game is to
use a paddle to bounce a ball up and break through a wall of bricks. The agent only
takes in the raw pixels from the screen, and it has to figure out what’s the right action
to take in order to maximize the score. Idea adapted from Mnih et al. [1]. Game
owned by Atari Interactive, Inc.
Go
Go is an ancient Chinese strategy board game played by two players,
who take turns laying pieces of stones on a 19x19 board with the goal
of surrounding more territory than the opponent. Each player has a set
of black or white stones, and the game begins with an empty board.
Players alternate placing stones on the board, with the black player
going first.
Fig. 1.2 Yoda Norimoto (black) vs. Kiyonari Tetsuya (white), Go game from the 66th
NHK Cup, 2018. White won by 0.5 points. Game record from CWI [4]
The stones are placed on the intersections of the lines on the board,
rather than in the squares. Once a stone is placed on the board, it
cannot be moved, but it can be captured by the opponent if it is
completely surrounded by their stones. Stones that are surrounded and
captured are removed from the board.
The game continues until both players pass, at which point the
territory on the board is counted. A player’s territory is the set of empty
intersections that are completely surrounded by their stones, plus any
captured stones. The player with the larger territory wins the game. In
the case of the final board position shown in Fig. 1.2, the white won by
0.5 points.
Although the rules of the game are relatively simple, the game is
extremely complex. For instance, the number of legal board positions in
Go is enormously large compared to Chess. According to research by
Tromp and Farnebä ck [3], the number of legal board positions in Go is
approximately , which is vastly greater than the number of
atoms in the universe.
This complexity presents a significant challenge for artificial
intelligence (AI) agents that attempt to play Go. In March 2016, an AI
agent called AlphaGo developed by Silver et al. [5] from DeepMind
made history by beating the legendary Korean player Lee Sedol with a
score of 4-1 in Go. Lee Sedol is a winner of 18 world titles and is
considered one of the greatest Go player of the past decade. AlphaGo’s
victory was remarkable because it used a combination of deep neural
networks and tree search algorithms, as well as the technique of
reinforcement learning.
AlphaGo was trained using a combination of supervised learning
from human expert games and reinforcement learning from games of
self-play. This training enabled the agent to develop creative and
innovative moves that surprised both Lee Sedol and the Go community.
The success of AlphaGo has sparked renewed interest in the field of
reinforcement learning and has demonstrated the potential for AI to
solve complex problems that were once thought to be the exclusive
domain of human intelligence. One year later, Silver et al. [6] from
DeepMind introduced a new and more powerful agent, AlphaGo Zero.
AlphaGo Zero was trained using pure self-play, without any human
expert moves in its training, achieving a higher level of play than the
previous AlphaGo agent. They also made other improvements like
simplifying the training processes.
To evaluate the performance of the new agent, they set it to play
games against the exact same AlphaGo agent that beat the world
champion Lee Sedol in 2016, and this time the new AlphaGo Zero beats
AlphaGo with score 100-0.
In the following year, Schrittwieser et al. [7] from DeepMind
generalized the AlphaGo Zero agent to play not only Go but also other
board games like Chess and Shogi (Japanese chess), and they called this
generalized agent AlphaZero. AlphaZero is a more general
reinforcement learning algorithm that can be applied to a variety of
board games, not just Go, Chess, and Shogi.
Reinforcement learning is a type of machine learning in which an
agent learns to make decisions based on the feedback it receives from
its environment. Both DQN and AlphaGo (and its successor) agents use
this technique, and their achievements are very impressive. Although
these agents are designed to play games, this does not mean that
reinforcement learning is only capable of playing games. In fact, there
are many more challenging problems in the real world, such as
navigating a robot, driving an autonomous car, and automating web
advertising. Games are relatively easy to simulate and implement
compared to these other real-world problems, but reinforcement
learning has the potential to be applied to a wide range of complex
challenges beyond game playing.
Environment
The environment is the world in which the agent operates. It can be a
physical system, such as a robot navigating a maze, or a virtual
environment, such as a game or a simulation. The environment
provides the agent with two pieces of information: the state of the
environment and a reward signal. The state describes the relevant
information about the environment that the agent needs to make a
decision, such as the position of the robot or the cards in a poker game.
The reward signal is a scalar value that indicates how well the agent is
doing in its task. The agent’s objective is to maximize its cumulative
reward over time.
The environment has its own set of rules, which determine how the
state and reward signal change based on the agent’s actions. These
rules are often called the dynamics of the environment. In many cases,
the agent does not have access to the underlying dynamics of the
environment and must learn them through trial and error. This is
similar to how we humans interact with the physical world every day,
normally we have a pretty good sense of what’s going on around us, but
it’s difficult to fully understand the dynamics of the universe.
Game environments are a popular choice for reinforcement learning
because they provide a clear objective and well-defined rules. For
example, a reinforcement learning agent could learn to play the game of
Pong by observing the screen and receiving a reward signal based on
whether it wins or loses the game.
In a robotic environment, the agent is a robot that must learn to
navigate a physical space or perform a task. For example, a
reinforcement learning agent could learn to navigate a maze by using
sensors to detect its surroundings and receiving a reward signal based
on how quickly it reaches the end of the maze.
State
In reinforcement learning, an environment state or simply state is the
statistical data provided by the environment to represent the current
state of the environment. The state can be discrete or continuous. For
instance, when driving a stick shift car, the speed of the car is a
continuous variable, while the current gear is a discrete variable.
Ideally, the environment state should contain all relevant
information that’s necessary for the agent to make decisions. For
example, in a single-player video game like Breakout, the pixels of
frames of the game contain all the information necessary for the agent
to make a decision. Similarly, in an autonomous driving scenario, the
sensor data from the car’s cameras, lidar, and other sensors provide
relevant information about the surrounding environment.
However, in practice, the available information may depend on the
task and domain. In a two-player board game like Go, for instance,
although we have perfect information about the board position, we
don’t have perfect knowledge about the opponent player, such as what
they are thinking in their head or what their next move will be. This
makes the state representation more challenging in such scenarios.
Furthermore, the environment state might also include noisy data.
For example, a reinforcement learning agent driving an autonomous car
might use multiple cameras at different angles to capture images of the
surrounding area. Suppose the car is driving near a park on a windy
day. In that case, the onboard cameras could also capture images of
some trees in the park that are swaying in the wind. Since the
movement of these trees should not affect the agent’s ability to drive,
because the trees are inside the park and not on the road or near the
road, we can consider these movements of the trees as noise to the self-
driving agent. However, it can be challenging to ignore them from the
captured images. To tackle this problem, researchers might use various
techniques such as filtering and smoothing to eliminate the noisy data
and obtain a cleaner representation of the environment state.
Reward
In reinforcement learning, the reward signal is a numerical value that
the environment provides to the agent after the agent takes some
action. The reward can be any numerical value, positive, negative, or
zero. However, in practice, the reward function often varies from task to
task, and we need to carefully design a reward function that is specific
to our reinforcement learning problem.
Designing an appropriate reward function is crucial for the success
of the agent. The reward function should be designed to encourage the
agent to take actions that will ultimately lead to achieving our desired
goal. For example, in the game of Go, the reward is 0 at every step
before the game is over, and +1 or if the agent wins or loses the
game, respectively. This design incentivizes the agent to win the game,
without explicitly telling it how to win.
Similarly, in the game of Breakout, the reward can be a positive
number if the agent destroys some bricks negative number if the agent
failed to catch the ball, and zero reward otherwise. This design
incentivizes the agent to destroy as many bricks as possible while
avoiding losing the ball, without explicitly telling it how to achieve a
high score.
The reward function plays a crucial role in the reinforcement
learning process. The goal of the agent is to maximize the accumulated
rewards over time. By optimizing the reward function, we can guide the
agent to learn a policy that will achieve our desired goal. Without the
reward signal, the agent would not know what the goal is and would
not be able to learn effectively.
In summary, the reward signal is a key component of reinforcement
learning that incentivizes the agent to take actions that ultimately lead
to achieving the desired goal. By carefully designing the reward
function, we can guide the agent to learn an optimal policy.
Agent
In reinforcement learning, an agent is an entity that interacts with an
environment by making decisions based on the received state and
reward signal from the environment. The agent’s goal is to maximize its
cumulative reward in the long run. The agent must learn to make the
best decisions by trial and error, which involves exploring different
actions and observing the resulting rewards.
In addition to the external interactions with the environment, the
agent may also has its internal state represents its knowledge about the
world. This internal state can include things like memory of past
experiences and learned strategies.
It’s important to distinguish the agent’s internal state from the
environment state. The environment state represents the current state
of the world that the agent is trying to influence through its actions. The
agent, however, has no direct control over the environment state. It can
only affect the environment state by taking actions and observing the
resulting changes in the environment. For example, if the agent is
playing a game, the environment state might include the current
positions of game pieces, while the agent’s internal state might include
the memory of past moves and the strategies it has learned.
In this book, we will typically use the term “state” to refer to the
environment state. However, it’s important to keep in mind the
distinction between the agent’s internal state and the environment
state. By understanding the role of the agent and its interactions with
the environment, we can better understand the principles behind
reinforcement learning algorithms. It is worth noting that the terms
“agent” and “algorithm” are frequently used interchangeably in this
book, particularly in later chapters.
Action
In reinforcement learning, the agent interacts with an environment by
selecting actions that affect the state of the environment. Actions are
chosen from a predefined set of possibilities, which are specific to each
problem. For example, in the game of Breakout, the agent can choose to
move the paddle to the left or right or take no action. It cannot perform
actions like jumping or rolling over. In contrast, in the game of Pong, the
agent can choose to move the paddle up or down but not left or right.
The chosen action affects the future state of the environment. The
agent’s current action may have long-term consequences, meaning that
it will affect the environment’s states and rewards for many future time
steps, not just the next immediate stage of the process.
Actions can be either discrete or continuous. In problems with
discrete actions, the set of possible actions is finite and well defined.
Examples of such problems include Atari and Go board games. In
contrast, problems with continuous actions have an infinite set of
possible actions, often within a continuous range of values. An example
of a problem with continuous actions is robotic control, where the
degree of angle movement of a robot arm is often a continuous action.
Reinforcement learning problems with discrete actions are
generally easier to solve than those with continuous actions. Therefore,
this book will focus on solving reinforcement learning problems with
discrete actions. However, many of the concepts and techniques
discussed in this book can be applied to problems with continuous
actions as well.
Policy
A policy is a key concept in reinforcement learning that defines the
behavior of an agent. In particular, it maps each possible state in the
environment to the probabilities of chose different actions. By
specifying how the agent should behave, a policy guides the agent to
interact with its environment and maximize its cumulative reward. We
will delve into the details of policies and how they interact with the
MDP framework in Chap. 2.
For example, suppose an agent is navigating a grid-world
environment. A simple policy might dictate that the agent should
always move to the right until it reaches the goal location. Alternatively,
a more sophisticated policy could specify that the agent should choose
its actions based on its current position and the probabilities of moving
to different neighboring states.
Model
In reinforcement learning, a model refers to a mathematical description
of the dynamics function and reward function of the environment. The
dynamics function describes how the environment evolves from one
state to another, while the reward function specifies the reward that the
agent receives for taking certain actions in certain states.
In many cases, the agent does not have access to a perfect model of
the environment. This makes learning a good policy challenging, since
the agent must learn from experience how to interact with the
environment to maximize its reward. However, there are some cases
where a perfect model is available. For example, if the agent is playing a
game with fixed rules and known outcomes, the agent can use this
knowledge to select its actions strategically. We will explore this
scenario in detail in Chap. 2.
In reinforcement learning, the agent-environment boundary can be
ambiguous. Despite a house cleaning robot appearing to be a single
agent, the agent’s direct control typically defines its boundary, while the
remaining components comprise the environment. In this case, the
robot’s wheels and other hardwares are considered to be part of the
environment since they aren’t directly controlled by the agent. We can
think of the robot as a complex system composed of several parts, such
as hardware, software, and the reinforcement learning agent, which can
control the robot’s movement by signaling the software interface, which
then communicates with microchips to manage the wheel movement.
Autonomous Driving
Reinforcement learning can be used to train autonomous vehicles to
navigate complex and unpredictable environments. The goal for the
agent is to safely and efficiently drive the vehicle to a desired location
while adhering to traffic rules and regulations. The reward signal could
be a positive number for successful arrival at the destination within a
specified time frame and a negative number for any accidents or
violations of traffic rules. The environment state could contain
information about the vehicle’s location, velocity, and orientation, as
well as sensory data such as camera feeds and radar readings.
Additionally, the state could include the current traffic conditions and
weather, which would help the agent to make better decisions while
driving.
Generalization Problem
In reinforcement learning (RL), the generalization problem refers to the
ability of an agent to apply what it has learned to new and previously
unseen situations. To understand this concept, consider the example of
a self-driving car. Suppose the agent is trying to learn to navigate a
particular intersection, with a traffic light and crosswalk. The agent
receives rewards for reaching its destination quickly and safely, but it
must also follow traffic laws and avoid collisions with other vehicles
and pedestrians.
During training, the agent is exposed to a variety of situations at the
intersection, such as different traffic patterns and weather conditions.
It learns to associate certain actions with higher rewards, such as
slowing down at the yellow light and stopping at the red light. Over
time, the agent becomes more adept at navigating the intersection and
earns higher cumulative rewards.
However, when the agent is faced with a new intersection, with
different traffic patterns and weather conditions, it may struggle to
apply what it has learned. This is where generalization comes in. If the
agent has successfully generalized its knowledge, it will be able to
navigate the new intersection based on its past experiences, even
though it has not seen this exact intersection before. For example, it
may slow down at a yellow light, even if the timing is slightly different
than what it has seen before, or it may recognize a pedestrian crossing
and come to a stop, even if the appearance of the crosswalk is slightly
different.
If the agent has not successfully generalized its knowledge, it may
struggle to navigate the new intersection and may make mistakes that
lead to lower cumulative rewards. For example, it may miss a red light
or fail to recognize a pedestrian crossing, because it has only learned to
recognize these situations in a particular context.
Therefore, generalization is a crucial aspect of RL, as it allows the
agent to apply its past experiences to new and previously unseen
situations, which can improve its overall performance and make it more
robust to changes in the environment.
1.8 Summary
In the first chapter of the book, readers were introduced to the concept
of reinforcement learning (RL) and its applications. The chapter began
by discussing the breakthroughs in AI in games, showcasing the success
of RL in complex games such as Go. The chapter then provided an
overview of the agent-environment interaction that forms the basis of
RL, including key concepts such as environment, agent, reward, state,
action, and policy. Several examples of RL were presented, including
Atari video game playing, board game Go, and robot control tasks.
Additionally, the chapter introduced common terms used in RL,
including episodic vs. continuing tasks, deterministic vs. stochastic
tasks, and model-free vs. model-based reinforcement learning. The
importance of studying RL was then discussed, including its potential to
solve complex problems and its relevance to real-world applications.
The challenges faced in RL, such as the exploration-exploitation
dilemma, the credit assignment problem, and the generalization
problem, were also explored.
The next chapter of the book will focus on Markov decision
processes (MDPs), which is a formal framework used to model RL
problems.
References
[1] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness,
Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg
Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen
King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis.
Human-level control through deep reinforcement learning. Nature,
518(7540):529–533, Feb 2015.
[Crossref]
[2]
M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning
environment: An evaluation platform for general agents. Journal of Artificial
Intelligence Research, 47:253–279, Jun 2013.
[Crossref]
[3]
John Tromp and Gunnar Farnebäck. Combinatorics of go. In H. Jaap van den
Herik, Paolo Ciancarini, and H. H. L. M. (Jeroen) Donkers, editors, Computers and
Games, pages 84–99, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.
[4]
CWI. 66th NHK Cup. https://homepages.c wi.nl/~aeb/go/games/games/N HK/
66/index.html, 2018.
[5]
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van
den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam,
Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner,
Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore
Graepel, and Demis Hassabis. Mastering the game of go with deep neural
networks and tree search. Nature, 529(7587):484–489, Jan 2016.
[Crossref]
[6]
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja
Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton,
Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche,
Thore Graepel, and Demis Hassabis. Mastering the game of go without human
knowledge. Nature, 550(7676):354–359, Oct 2017.
[Crossref]
[7]
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew
Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel,
Timothy Lillicrap, Karen Simonyan, and Demis Hassabis. Mastering chess and
shogi by self-play with a general reinforcement learning algorithm, 2017.
[8]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification
with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and
K.Q. Weinberger, editors, Advances in Neural Information Processing Systems,
volume 25. Curran Associates, Inc., 2012.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning
for image recognition, 2015.
Footnotes
1 Dog Thinks Through A Problem: www.youtube.c om/watch?v =m_C rIu01SnM.
*A flighty spirit will turn a profane spirit, and will sin and pray
too. Slightness is the bane of profession. If it be not rooted out of
the heart by constant and serious dealings with, and beholdings of
Christ in duties, it will grow more strong and more deadly, by being
under church ordinances. Be serious and exact in duty, having the
weight of it upon thy heart; but be as much afraid of grounding thy
comfort on duties as on sins. Comfort from any hand but Christ is
deadly. Be much in prayer, or you will never keep up much
communion with God. As you are in closet prayer, so you will be in
all other ordinances.
14. Be true to truth, but not turbulent and scornful; restore such
as are fallen, with all the bowels of Christ. Set the broken disjointed
bones with the grace of the gospel! Despise not the weak; thou
mayst come to wish to be in the condition of the meanest of them.
Be faithful to others infirmities, but sensible of thine own. Visit sick
beds and deserted souls much; they are excellent scholars in
experience.
Abide in your calling. Be dutiful to all relations as to the Lord. Be
content with little of the world; little will serve. Think every little
much, because unworthy of the least. Think every one better than
thyself; loathing thyself as one fit to be trampled on by all saints.
See the vanity of the world and love nothing but Christ. Mourn to
see so little of Christ in the world. To a secure soul Christ is but a
fable, the scripture but a story. Mourn to think how many are under
church order that are not under grace. Prepare for the cross;
welcome it; bear it triumphantly like Christ’s cross, whether, scoffs,
mockings, contempt, imprisonments.—But see it be Christ’s cross,
not thine own.
15. Sin will hinder from glorying in the cross of Christ. And
omitting little things against light may breed hell in the conscience,
as well as committing the greatest sins. If thou hast been taken out
of the belly of hell into Christ’s bosom, and made sit among princes
in the houshold of God, Oh, how shouldest thou live as a pattern of
mercy!—Redeemed, restored soul, what infinite sums dost thou owe
Christ! With what zeal shouldst thou walk, and do every duty!
Sabbaths, what praising days should they be to thee!—Church
fellowship! What a heaven, a being with Christ, and angels, and
saints! What a drowning of the soul in eternal love, as a burial with
Christ, dying to all things besides him! Every time thou thinkest of
Christ be astonished; and when thou seest sin, look at Christ’s grace,
that did pardon it; and when thou art proud, look at Christ’s grace,
that shall strike thee down in the dust.
*Remember Christ’s time of love. When thou wast naked, then
he chose thee. Canst thou ever have a proud thought? Remember
whose arms supported thee from sinking, and delivered thee from
the lowest hell, and shout in the ears of angels and men, and for
ever sing praise, praise! Grace, grace! Daily repent and pray; and
walk in the sight of grace, as one that hath the anointings of grace
upon thee. Remember thy sins, Christ’s pardonings; thy deserts,
Christ’s merits; thy weakness, Christ’s strength; thy pride, Christ’s
humility; thy guilts, Christ’s new application of his blood; thy wants,
Christ’s fulness; thy temptations, Christ’s tenderness; thy vileness,
Christ’s righteousness.
Looking at the natural sun weakeneth the eye. The more you
look at Christ, the son of righteousness, the stronger and clearer will
the eye of faith be. Look but at Christ, you will love him, and live on
him. Think on him continually; keep the eye constantly upon Christ’s
blood, or every blast of temptation will shake you. If you will see
sin’s sinfulness, to loath it and mourn, do not stand looking on sin
only, but look upon Christ as suffering and satisfying. If you would
see your grace, your sanctification, do not stand gazing upon them,
but look at Christ’s righteousness first; look at your grace in the
second place.
19. *Thou who hast seen Christ all, and thyself absolutely
nothing, who makest Christ all thy life, and art dead to all
righteousness besides; do Christ this one favour for all his love to
thee, love all his poor saints, (the meanest, the weakest,
notwithstanding any difference in judgment) they are engraven on
his heart, let them be so on thine. Pray for the peace of Jerusalem,
they shall prosper that love thee. Psalms cxxii. 6.
CHRISTIAN LETTERS.
By Mr. JOSEPH ALLEINE.
To the READER.
JOHN WESLEY.
London,
March 7, 1767.
L E T T E R I.
To his wife concerning his acceptance of Taunton.
My dear heart,
First, I lay this for a foundation, That a man’s life consists not in
the abundance of the things that he possesseth. It was accounted a
wise prayer that Agur put up, to be fed with food convenient for him.
And certain it is, that where men have least of the world, they
esteem it least, and live more by faith in God, casting their care and
burden upon him. O the sweet breathing of David’s soul! The strong
actings of his faith when his condition was low and mean! How fully
doth he rely upon God. And certainly could we that are
unexperienced, but feel the thorns of those cares and troubles, that
there are in gathering and keeping much, and the danger when
riches increase of setting our hearts upon them, we should prize the
happiness of a middle condition. Doubtless, godliness with
contentment is great gain. Seekest thou great things for thyself (said
the prophet to Baruch) seek them not. Certainly a good conscience
is a continual feast, and enough for a happy life.
*Secondly, I take this for an undoubted truth, that a dram of
grace is better than a talent of wealth; and therefore such a place
where we have little to do with the world to take off our thoughts
from the things of eternity, and have the advantage of abundance of
means, and the daily opportunities of warming our hearts with the
blessed society of Christians, is (if we pass a true spiritual judgment)
without comparison before another place, void of those spiritual
advantages. Let us think, what though our purses may thrive better
in a place of large maintenance; yet where are our souls like to
thrive any way answerable to what they are in this? We should have
but little in the world, but what is this, if it be made up to us, in
communion with God and his people? If we thrive in faith, and love,
humility, and heavenly-mindedness, what matter is it, though we do
not raise ourselves in the world? Oh! Who would leave so much
grace, and so much comfort in communion with Christ and his
saints, for the probabilities of living a little more handsomely. ’Tis a
strange thing to see how Christians generally judge. What is it worth
a year? Is the maintenance certain? What charges are there like to
be? These are the questions we commonly ask first, when we speak
of settling. But alas, though those things are to be considered too,
yet what good am I like to do? What good am I like to get? These
should be the chief things we should judge by. What if we have but
a little in the world? Why then we must keep but a short table, and
give the meaner entertainment to our friends. O, but will not this be
abundantly made up, if we have more outward and inward peace?
Let others hug themselves in their corn, and wine, and oil, in their
fat livings, and their large tables, if we have more of the light of
God’s countenance, who would change with them?
Thirdly, That the surest way to have any outward mercy, is to be
content to want it. When men’s desires are over-eager after the
world, they must have thus much a year, and a house well furnished,
and wife, and children, thus and thus qualified, God doth usually,
break their wills by denying them, as one would cross a froward
child of his stubborn humour: or else puts a sting into them, that a
man had been as well without them. The best way to get riches, is
out of doubt to set them lowest in one’s desires. Solomon found it
so: he did not ask riches, but wisdom, but God was so pleased, that
he threw in them into the bargain. Nothing sets God’s mercies
farther off than the want of free submission to want them. Certainly,
God will never be behind hand with us. Let our care be to build his
house, and let him alone to build ours.
*Fourthly, That none ever was, or ever shall be, a loser by Jesus
Christ. Many have lost much for him, but never did, never shall any
lose by him. Take this for a certainty, whatsoever outward comforts
we leave, or outward advantages, that we may glorify him in our
services, and enjoy him in all his ordinances more than other-where
we could, we shall receive an hundred fold in this life. ’Tis a sad
thing to see how little Christ is trusted; men will trust him no farther
than they can see him. Alas, hath he not a thousand ways, both
outward and inward, to make up a little outward advantage to us?
Have any ventured themselves upon him in his way, but he made
good every promise to them? Let us therefore exercise our faith, and
stay ourselves on the promise, and see if ever we are ashamed of
our hope.
Fifthly, That what is wanting in the means, God will make up in
the blessing. This I take for a certain truth, while a man commits
himself and his affairs to God, and is in any way that God put him
into; and if a man have but a little income, if he have a great
blessing, that will make it up. Alas, we must not account of mercies
by the bulk. What if another have a pound to my ounce, if mine be
gold for his silver, I will never change with him. As ’tis not bread that
keeps men alive, but the word of blessing that proceedeth out of the
mouth of God; so ’tis not the largeness of the means, but the
blessing of the Lord that maketh rich. Oh! If men did but believe this
they would not grasp so much of the world as they do. Well, let
others take their course, and we will take ours, to wait upon God by
faith and prayer: let others toil to enlarge their income, we will pray
God to enlarge our blessing, and I doubt not but we shall prove the
gainers.
Sixthly, That every condition hath its snares, and troubles, and
therefore we may not expect to be without them wherever we be;
only that condition is most eligible that hath fewest and least. I
cannot object any thing against the proposal of Taunton, but the
meanness of the maintenance. And let us consider how
inconsiderable this inconvenience is, in comparison of those we must
reckon upon meeting with, if God cast us into another place. Upon
these considerations, I find my heart much inclined to accept of their
offer at Taunton. I beseech thee to weigh the matter and tell me thy
thoughts, and which way thy spirit inclines, for I have always
resolved the place I settled in should be to thy content. I have been
so large in delivering my judgment, that I must thrust up my
affections into a corner. Well, though they have but a corner in my
letter, I am sure they have room enough in my heart: but I must
conclude; the Lord keep thee my dear, and cherish thee for ever in
his bosom; farewell, mine own soul.
JOS. ALLEINE.
Oxon, May 27, 1659.
L E T T E R II.
To my most dearly beloved, my Christian friends in Taunton,
salvation.
JOS. ALLEINE.
Many have escaped the gross pollutions of the world, but stick in
the form of godliness. O I am jealous for you! That you may not lose
the things that you have wrought, for the Lord’s sake put on, and
beware of perishing in the suburbs of the city of refuge. Beg of God
to make thorough work with you; be jealous for yourselves and try
your estates, but only with those marks, that you are sure will abide
God’s trial.
But for you that fear the Lord in sincerity, I have nothing but
good and comfortable words: may your souls ever live. What
condition can you devise wherein there will not be matter of joy
unspeakable to you. O beloved, know your own happiness, and live
in that holy admiring, adoring, praising of your gracious God, that
becomes the people of his praise. The good will of him that dwelt in
the bush with you all. The Lord create a defence upon you, and
deliverance for you: The Lord cover you all the day, and make you to
dwell between his shoulders! I desire your constant, instant, earnest
prayers for me, and rest,
JOS. ALLEINE.
JOS. ALLEINE.
L E T T E R V.
To my most dearly beloved friends, in Taunton, grace and
peace.
May the Lord of Hosts be with you, and the God of Jacob your
refuge. Farewell my dear brethren, farewell, and be strong in the
Lord. I am
JOS. ALLEINE.
L E T T E R VI.
To the beloved friends, the flock of Christ in Taunton,
salvation.