Data Parallel C Programming Accelerated Systems Using C And Sycl 2nd Edition James Reinders instant download
Data Parallel C Programming Accelerated Systems Using C And Sycl 2nd Edition James Reinders instant download
https://ebookbell.com/product/data-parallel-c-programming-
accelerated-systems-using-c-and-sycl-2nd-edition-james-
reinders-52721604
https://ebookbell.com/product/data-parallel-c-programming-accelerated-
systems-using-c-and-sycl-james-reinders-ben-ashbaugh-james-brodman-
michael-kinsner-john-pennycook-xinmin-tian-52722142
https://ebookbell.com/product/data-parallel-c-mastering-dpc-for-
programming-of-heterogeneous-systems-using-c-and-sycl-1st-ed-james-
reinders-30716410
Parallel Computing For Data Science With Examples In R C And Cuda 2nd
Edition Norman Matloff
https://ebookbell.com/product/parallel-computing-for-data-science-
with-examples-in-r-c-and-cuda-2nd-edition-norman-matloff-5127002
Techniques And Environments For Big Data Analysis Parallel Cloud And
Grid Computing 1st Edition Bhabani Shankar Prasad Mishra
https://ebookbell.com/product/techniques-and-environments-for-big-
data-analysis-parallel-cloud-and-grid-computing-1st-edition-bhabani-
shankar-prasad-mishra-5355428
Parallel Spatialdata Conversion Engine Enabling Fast Sharing Of
Massive Geospatial Data Shuai Zhang
https://ebookbell.com/product/parallel-spatialdata-conversion-engine-
enabling-fast-sharing-of-massive-geospatial-data-shuai-zhang-10882728
https://ebookbell.com/product/sequential-and-parallel-algorithms-and-
data-structures-the-basic-toolbox-sanders-10600742
https://ebookbell.com/product/parallel-computing-architectures-and-
apis-iot-big-data-stream-processing-vivek-kale-11088800
https://ebookbell.com/product/data-mining-for-association-rules-and-
sequential-patterns-sequential-and-parallel-algorithms-1st-edition-
jeanmarc-adamo-auth-4198728
Data Parallel C++
Programming Accelerated Systems Using
C++ and SYCL
—
Second Edition
—
James Reinders
Ben Ashbaugh
James Brodman
Michael Kinsner
John Pennycook
Xinmin Tian
Foreword by Erik Lindahl, GROMACS and
Stockholm University
Data Parallel C++
Programming Accelerated
Systems Using C++ and SYCL
Second Edition
James Reinders
Ben Ashbaugh
James Brodman
Michael Kinsner
John Pennycook
Xinmin Tian
Foreword by Erik Lindahl, GROMACS and
Stockholm University
Data Parallel C++: Programming Accelerated Systems Using C++ and SYCL, Second Edition
James Reinders Michael Kinsner
Beaverton, OR, USA Halifax, NS, Canada
Ben Ashbaugh John Pennycook
Folsom, CA, USA San Jose, CA, USA
James Brodman Xinmin Tian
Marlborough, MA, USA Fremont, CA, USA
Preface����������������������������������������������������������������������������������������������xxi
Foreword������������������������������������������������������������������������������������������ xxv
Acknowledgments��������������������������������������������������������������������������� xxix
Chapter 1: Introduction������������������������������������������������������������������������1
Read the Book, Not the Spec��������������������������������������������������������������������������������2
SYCL 2020 and DPC++�����������������������������������������������������������������������������������������3
Why Not CUDA?�����������������������������������������������������������������������������������������������������4
Why Standard C++ with SYCL?����������������������������������������������������������������������������5
Getting a C++ Compiler with SYCL Support���������������������������������������������������������5
Hello, World! and a SYCL Program Dissection�������������������������������������������������������6
Queues and Actions����������������������������������������������������������������������������������������������7
It Is All About Parallelism��������������������������������������������������������������������������������������8
Throughput������������������������������������������������������������������������������������������������������8
Latency������������������������������������������������������������������������������������������������������������9
Think Parallel���������������������������������������������������������������������������������������������������9
Amdahl and Gustafson����������������������������������������������������������������������������������10
Scaling�����������������������������������������������������������������������������������������������������������11
Heterogeneous Systems��������������������������������������������������������������������������������11
Data-Parallel Programming���������������������������������������������������������������������������13
iii
Table of Contents
iv
Table of Contents
v
Table of Contents
vi
Table of Contents
vii
Table of Contents
Data Management���������������������������������������������������������������������������������������������165
Initialization�������������������������������������������������������������������������������������������������165
Data Movement�������������������������������������������������������������������������������������������166
Queries��������������������������������������������������������������������������������������������������������������174
One More Thing�������������������������������������������������������������������������������������������������177
Summary����������������������������������������������������������������������������������������������������������178
Chapter 7: Buffers����������������������������������������������������������������������������179
Buffers��������������������������������������������������������������������������������������������������������������180
Buffer Creation��������������������������������������������������������������������������������������������181
What Can We Do with a Buffer?������������������������������������������������������������������188
Accessors����������������������������������������������������������������������������������������������������������189
Accessor Creation���������������������������������������������������������������������������������������192
What Can We Do with an Accessor?������������������������������������������������������������198
Summary����������������������������������������������������������������������������������������������������������199
viii
Table of Contents
ix
Table of Contents
x
Table of Contents
xi
Table of Contents
xii
Table of Contents
GPU Hardware���������������������������������������������������������������������������������������������402
Beware the Cost of Offloading!��������������������������������������������������������������������403
GPU Kernel Best Practices��������������������������������������������������������������������������������405
Accessing Global Memory���������������������������������������������������������������������������405
Accessing Work-Group Local Memory���������������������������������������������������������409
Avoiding Local Memory Entirely with Sub-Groups��������������������������������������412
Optimizing Computation Using Small Data Types����������������������������������������412
Optimizing Math Functions��������������������������������������������������������������������������413
Specialized Functions and Extensions��������������������������������������������������������414
Summary����������������������������������������������������������������������������������������������������������414
For More Information�����������������������������������������������������������������������������������415
xiii
Table of Contents
xiv
Table of Contents
xv
Table of Contents
Summary����������������������������������������������������������������������������������������������������������556
For More Information�����������������������������������������������������������������������������������557
xvi
Table of Contents
Index�������������������������������������������������������������������������������������������������615
xvii
About the Authors
James Reinders is an Engineer at Intel Corporation with more than four
decades of experience in parallel computing and is an author/coauthor/
editor of more than ten technical books related to parallel programming.
James has a passion for system optimization and teaching. He has had the
great fortune to help make contributions to several of the world’s fastest
computers (#1 on the TOP500 list) as well as many other supercomputers
and software developer tools.
xix
About the Authors
xx
Preface
If you are new to parallel programming that is okay. If you have never
heard of SYCL or the DPC++ compilerthat is also okay
Compared with programming in CUDA, C++ with SYCL offers
portability beyond NVIDIA, and portability beyond GPUs, plus a tight
alignment to enhance modern C++ as it evolves too. C++ with SYCL offers
these advantages without sacrificing performance.
C++ with SYCL allows us to accelerate our applications by harnessing
the combined capabilities of CPUs, GPUs, FPGAs, and processing devices
of the future without being tied to any one vendor.
SYCL is an industry-driven Khronos Group standard adding
advanced support for data parallelism with C++ to exploit accelerated
(heterogeneous) systems. SYCL provides mechanisms for C++ compilers
that are highly synergistic with C++ and C++ build systems. DPC++ is an
open source compiler project based on LLVM that adds SYCL support.
All examples in this book should work with any C++ compiler supporting
SYCL 2020 including the DPC++ compiler.
If you are a C programmer who is not well versed in C++, you are in
good company. Several of the authors of this book happily share that
they picked up much of C++ by reading books that utilized C++ like this
one. With a little patience, this book should also be approachable by C
programmers with a desire to write modern C++ programs.
Second Edition
With the benefit of feedback from a growing community of SYCL users, we
have been able to add content to help learn SYCL better than ever.
xxi
Preface
This edition teaches C++ with SYCL 2020. The first edition preceded
the SYCL 2020 specification, which differed only slightly from what the
first edition taught (the most obvious changes for SYCL 2020 in this edition
are the header file location, the device selector syntax, and dropping an
explicit host device).
xxii
Preface
xxiii
Preface
xxiv
Foreword
SYCL 2020 is a milestone in parallel computing. For the first time we have
a modern, stable, feature-complete, and portable open standard that can
target all types of hardware, and the book you hold in your hand is the
premier resource to learn SYCL 2020.
Computer hardware development is driven by our needs to solve
larger and more complex problems, but those hardware advances are
largely useless unless programmers like you and me have languages that
allow us to implement our ideas and exploit the power available with
reasonable effort. There are numerous examples of amazing hardware,
and the first solutions to use them have often been proprietary since it
saves time not having to bother with committees agreeing on standards.
However, in the history of computing, they have eventually always ended
up as vendor lock-in—unable to compete with open standards that allow
developers to target any hardware and share code—because ultimately the
resources of the worldwide community and ecosystem are far greater than
any individual vendor, not to mention how open software standards drive
hardware competition.
Over the last few years, my team has had the tremendous privilege
of contributing to shaping the emerging SYCL ecosystem through our
development of GROMACS, one of the world’s most widely used scientific
HPC codes. We need our code to run on every supercomputer in the
world as well as our laptops. While we cannot afford to lose performance,
we also depend on being part of a larger community where other teams
invest effort in libraries we depend on, where there are open compilers
available, and where we can recruit talent. Since the first edition of this
book, SYCL has matured into such a community; in addition to several
xxv
Foreword
1
Community-driven implementation from Heidelberg University: tinyurl.com/
HeidelbergSYCL
2
DPC++ compiler project: github.com/intel/llvm
3
GROMACS: gitlab.com/gromacs/gromacs/
xxvi
Foreword
xxvii
Foreword
Erik Lindahl
Professor of Biophysics
Dept. Biophysics & Biochemistry
Science for Life Laboratory
Stockholm University
xxviii
Acknowledgments
We have been blessed with an outpouring of community input for this
second edition of our book. Much inspiration came from interactions with
developers as they use SYCL in production, classes, tutorials, workshops,
conferences, and hackathons. SYCL deployments that include NVIDIA
hardware, in particular, have helped us enhance the inclusiveness and
practical tips in our teaching of SYCL in this second edition.
The SYCL community has grown a great deal—and consists of
engineers implementing compilers and tools, and a much larger group of
users that adopt SYCL to target hardware of many types and vendors. We
are grateful for their hard work, and shared insights.
We thank the Khronos SYCL Working Group that has worked diligently
to produce a highly functional specification. In particular, Ronan Keryell
has been the SYCL specification editor and a longtime vocal advocate
for SYCL.
We are in debt to the numerous people who gave us feedback from
the SYCL community in all these ways. We are also deeply grateful for
those who helped with the first edition a few years ago, many of whom we
named in the acknowledgement of that edition.
The first edition received feedback via GitHub,1 which we did review
but we were not always prompt in acknowledging (imagine six coauthors
all thinking “you did that, right?”). We did benefit a great deal from that
feedback, and we believe we have addressed all the feedback in the
samples and text for this edition. Jay Norwood was the most prolific at
commenting and helping us—a big thank you to Jay from all the authors!
1
github.com/apress/data-parallel-CPP
xxix
Acknowledgments
xxx
CHAPTER 1
Introduction
We have undeniably entered the age of accelerated computing. In order to
satisfy the world’s insatiable appetite for more computation, accelerated
computing drives complex simulations, AI, and much more by providing
greater performance and improved power efficiency when compared with
earlier solutions.
Heralded as a “New Golden Age for Computer Architecture,”1 we are
faced with enormous opportunity through a rich diversity in compute
devices. We need portable software development capabilities that are
not tied to any single vendor or architecture in order to realize the full
potential for accelerated computing.
SYCL (pronounced sickle) is an industry-driven Khronos Group
standard adding advanced support for data parallelism with C++ to
support accelerated (heterogeneous) systems. SYCL provides mechanisms
for C++ compilers to exploit accelerated (heterogeneous) systems in a way
that is highly synergistic with modern C++ and C++ build systems. SYCL is
not an acronym; SYCL is simply a name.
1
A New Golden Age for Computer Architecture by John L. Hennessy, David
A. Patterson; Communications of the ACM, February 2019, Vol. 62 No. 2,
Pages 48-60.
Data parallelism in C++ with SYCL provides access to all the compute
devices in a modern accelerated (heterogeneous) system. A single C++
application can use any combination of devices—including GPUs, CPUs,
FPGAs, and application-specific integrated circuits (ASICs)—that are
suitable to the problems at hand. No proprietary, single-vendor, solution
can offer us the same level of flexibility.
This book teaches us how to harness accelerated computing using
data-parallel programming using C++ with SYCL and provides practical
advice for balancing application performance, portability across compute
devices, and our own productivity as programmers. This chapter lays
the foundation by covering core concepts, including terminology, which
are critical to have fresh in our minds as we learn how to accelerate C++
programs using data parallelism.
2
Chapter 1 Introduction
3
Chapter 1 Introduction
4
Chapter 1 Introduction
5
Chapter 1 Introduction
debuggers, and other tools, known as the oneAPI project. The oneAPI
tools, including the DPC++ compiler, are freely available (www.oneapi.io/
implementations).
1. #include <iostream>
2. #include <sycl/sycl.hpp>
3. using namespace sycl;
4.
5. const std::string secret{
6. "Ifmmp-!xpsme\"\012J(n!tpssz-!Ebwf/!"
7. "J(n!bgsbje!J!dbo(u!ep!uibu/!.!IBM\01"};
8.
9. const auto sz = secret.size();
10.
11. int main() {
12. queue q;
13.
14. char* result = malloc_shared<char>(sz, q);
15. std::memcpy(result, secret.data(), sz);
16.
17. q.parallel_for(sz, [=](auto& i) {
18. result[i] -= 1;
19. }).wait();
20.
21. std::cout << result << "\n";
22. free(result, q);
23. return 0;
24. }
6
Chapter 1 Introduction
Line 18 is the kernel code that we want to run on devices. That kernel
code decrements a single character. With the power of parallel_for(),
that kernel is run on each character in our secret string in order to decode
it into the result string. There is no ordering of the work required, and it is
run asynchronously relative to the main program once the parallel_for
queues the work. It is critical that there is a wait (line 19) before looking at
the result to be sure that the kernel has completed, since in this example
we are using a convenient feature (Unified Shared Memory, Chapter 6).
Without the wait, the output may occur before all the characters have been
decrypted. There is more to discuss, but that is the job of later chapters.
7
Chapter 1 Introduction
Throughput
Increasing throughput of a program comes when we get more work done
in a set amount of time. Techniques like pipelining may stretch out the
time necessary to get a single work-item done, to allow overlapping of
8
Chapter 1 Introduction
Latency
What if we want to get one thing done faster—for instance, analyzing
a voice command and formulating a response? If we only cared about
throughput, the response time might grow to be unbearable. The concept
of latency reduction requires that we break up an item of work into
pieces that can be tackled in parallel. For throughput, image processing
might assign whole images to different processing units—in this case,
our goal may be optimizing for images per second. For latency, image
processing might assign each pixel within an image to different processing
cores—in this case, our goal may be maximizing pixels per second from a
single image.
Think Parallel
Successful parallel programmers use both techniques in their
programming. This is the beginning of our quest to Think Parallel.
We want to adjust our minds to think first about where parallelism
can be found in our algorithms and applications. We also think about how
different ways of expressing the parallelism affect the performance we
ultimately achieve. That is a lot to take in all at once. The quest to Think
Parallel becomes a lifelong journey for parallel programmers. We can learn
a few tips here.
9
Chapter 1 Introduction
10
Chapter 1 Introduction
Scaling
The word “scaling” appeared in our prior discussion. Scaling is a measure
of how much a program speeds up (simply referred to as “speed-up”)
when additional computing is available. Perfect speed-up happens if
one hundred packages are delivered in the same time as one package,
by simply having one hundred trucks with drivers instead of a single
truck and driver. Of course, it does not reliably work that way. At some
point, there is a bottleneck that limits speed-up. There may not be one
hundred places for trucks to dock at the distribution center. In a computer
program, bottlenecks often involve moving data around to where it will
be processed. Distributing to one hundred trucks is similar to having to
distribute data to one hundred processing cores. The act of distributing
is not instantaneous. Chapter 3 starts our journey of exploring how to
distribute data to where it is needed in a heterogeneous system. It is critical
that we know that data distribution has a cost, and that cost affects how
much scaling we can expect from our applications.
Heterogeneous Systems
For our purposes, a heterogeneous system is any system which contains
multiple types of computational devices. For instance, a system with both
a central processing unit (CPU) and a graphics processing unit (GPU) is a
heterogeneous system. The CPU is often just called a processor, although
that can be confusing when we speak of all the processing units in a
heterogeneous system as compute processors. To avoid this confusion,
SYCL refers to processing units as devices. An application always runs on
a host that in turn sends work to devices. Chapter 2 begins the discussion
of how our main application (host code) will steer work (computations) to
particular devices in a heterogeneous system.
11
Chapter 1 Introduction
A program using C++ with SYCL runs on a host and issues kernels of
work to devices. Although it might seem confusing, it is important to know
that the host will often be able to serve as a device. This is valuable for two
key reasons: (1) the host is most often a CPU that will run a kernel if no
accelerator is present—a key promise of SYCL for application portability
is that a kernel can always be run on any system even those without
accelerators—and (2) CPUs often have vector, matrix, tensor, and/or
AI processing capabilities that are accelerators that kernels map well to
run upon.
Host code invokes code on devices. The capabilities of the host are
very often available as a device also, to provide both a back-up
device and to offer any acceleration capabilities the host has for
processing kernels as well. Our host is most often a CPU, and as such
it may be available as a CPU device. There is no guarantee by SYCL of
a CPU device, only that there is at least one device available to be the
default device for our application.
12
Chapter 1 Introduction
Data-Parallel Programming
The phrase “data-parallel programming” has been lingering unexplained
ever since the title of this book. Data-parallel programming focuses on
parallelism that can be envisioned as a bunch of data to operate on in
parallel. This shift in focus is like Gustafson vs. Amdahl. We need one
hundred packages to deliver (effectively lots of data) in order to divide
up the work among one hundred trucks with drivers. The key concept
comes down to what we should divide. Should we process whole images
13
Chapter 1 Introduction
Single-Source
Programs are single-source, meaning that the same translation unit2
contains both the code that defines the compute kernels to be executed
on devices and also the host code that orchestrates execution of those
compute kernels. Chapter 2 begins with a more detailed look at this
capability. We can still divide our program source into different files and
translation units for host and device code if we want to, but the key is that
we don’t have to!
2
We could just say “file,” but that is not entirely correct here. A translation unit
is the actual input to the compiler, made from the source file after it has been
processed by the C preprocessor to inline header files and expand macros.
14
Chapter 1 Introduction
Host
Every program starts by running on a host, and most of the lines of code
in a program are usually for the host. Thus far, hosts have always been
CPUs. The standard does not require this, so we carefully describe it as
a host. This seems unlikely to be anything other than a CPU because the
host needs to fully support C++17 in order to support all C++ with SYCL
programs. As we will see shortly, devices (accelerators) do not need to
support all of C++17.
Devices
Using multiple devices in a program is what makes it heterogeneous
programming. That is why the word device has been recurring in this
chapter since the explanation of heterogeneous systems a few pages ago.
We already learned that the collection of devices in a heterogeneous
system can include GPUs, FPGAs, DSPs, ASICs, CPUs, and AI chips, but is
not limited to any fixed list.
Devices are the targets to gain acceleration. The idea of offloading
computations is to transfer work to a device that can accelerate completion
of the work. We have to worry about making up for time lost moving
data—a topic that needs to constantly be on our minds.
Sharing Devices
On a system with a device, such as a GPU, we can envision two or more
programs running and wanting to use a single device. They do not need to
be programs using SYCL. Programs can experience delays in processing by
the device if another program is currently using it. This is really the same
philosophy used in C++ programs in general for CPUs. Any system can be
overloaded if we run too many active programs on our CPU (mail, browser,
virus scanning, video editing, photo editing, etc.) all at once.
15
Chapter 1 Introduction
Kernel Code
Code for a device is specified as kernels. This is a concept that is not
unique to C++ with SYCL: it is a core concept in other offload acceleration
languages including OpenCL and CUDA. While it is distinct from loop-
oriented approaches (such as commonly used with OpenMP target
offloads), it may resemble the body of code within the innermost loop
without requiring the programmer to write the loop nest explicitly.
Kernel code has certain restrictions to allow broader device support
and massive parallelism. The list of features not supported in kernel code
includes dynamic polymorphism, dynamic memory allocations (therefore
no object management using new or delete operators), static variables,
function pointers, runtime type information (RTTI), and exception
handling. No virtual member functions, and no variadic functions, are
allowed to be called from kernel code. Recursion is not allowed within
kernel code.
16
Chapter 1 Introduction
VIRTUAL FUNCTIONS?
While we will not discuss it further in this book, the DPC++ compiler project does
have an experimental extension (visible in the open source project, of course) to
implement some support for virtual functions within kernels. Thanks to the nature
of offloading to accelerator efficiently, virtual functions cannot be supported well
without some restrictions, but many users have expressed interest in seeing
SYCL offer such support even with some restrictions. The beauty of open source,
and the open SYCL specification, is the opportunity to participate in experiments
that can inform the future of C++ and SYCL specifications. Visit the DPC++
project (github.com/intel/llvm) for more information.
17
Chapter 1 Introduction
1. ! Fortran loop
2. do i = 1, n
3. z(i) = alpha * x(i) + y(i)
4. end do
1. // C/C++ loop
2. for (int i=0;i<n;i++) {
3. z[i] = alpha * x[i] + y[i];
4. }
1. // SYCL kernel
2. q.parallel_for(range{n},[=](id<1> i) {
3. z[i] = alpha * x[i] + y[i];
4. }).wait();
Asynchronous Execution
The asynchronous nature of programming using C++ with SYCL must not
be missed. Asynchronous programming is critical to understand for two
reasons: (1) proper use gives us better performance (better scaling), and
(2) mistakes lead to parallel programming errors (usually race conditions)
that make our applications unreliable.
The asynchronous nature comes about because work is transferred to
devices via a “queue” of requested actions. The host program submits a
requested action into a queue, and the program continues without waiting
for any results. This no waiting is important so that we can try to keep
computational resources (devices and the host) busy all the time. If we had
to wait, that would tie up the host instead of allowing the host to do useful
work. It would also create serial bottlenecks when the device finished, until
we queued up new work. Amdahl’s Law, as discussed earlier, penalizes us
for time spent not doing work in parallel. We need to construct our programs
to be moving data to and from devices while the devices are busy and keep
all the computational power of the devices and host busy any time work is
available. Failure to do so will bring the full curse of Amdahl’s Law upon us.
18
Chapter 1 Introduction
19
Chapter 1 Introduction
An astute reader noticed that the code in Figure 1-3 did not fail on every
system they tried. Using a GPU with partition_max_sub_devices==0 did
not fail because it was a small GPU not capable of running the parallel_for
until the memcpy had completed. Regardless, the code is flawed because the
race condition exists even if it does not universally cause a failure at runtime.
We call it a race—sometimes we win, and sometimes we lose. Such coding
flaws can lay dormant until the right combination of compile and runtime
environments lead to an observable failure.
20
Random documents with unrelated
content Scribd suggests to you:
The Project Gutenberg eBook of Garside's
Career: A Comedy in Four Acts
This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.
Language: English
1914
TO
A. N. MONKHOUSE
CONTENTS
GARSIDE'S CAREER
ACT I
ACT II
ACT III.
ACT IV
GARSIDE'S CAREER
ACT I
Interior of an artisan cottage. Door centre, leading direct to street,
door right to house. Fireplace with kitchen range left. Table centre,
with print cloth. Two plain chairs under it, one left, one centre,
facing audience. Rocking-chair by fireplace. Two chairs against wall
right, above door. Dresser right, below door. Small hanging bookcase
on wall, left centre. Window right centre. On walls plainly framed
photographs of Socialist leaders—Blatchford, Hyndman, Hardie. The
time is 7.0 p.m. on a June evening.
Mar. (gently). You don't mind my being here to hear it with you?
Mar. Peter said the results come out too late for the evening
papers.
Mar. Yes. He knows we're waiting here, we two who care for
Peter more than anything on earth.
Mrs. G. (giving her a jealous glance). I wish he'd come.
Mrs. G. Yes. I know I'm a fidget. I want to hear it from his own
lips. He's worked so hard he can't fail. (Accusingly.) You don't believe
me, Margaret. You're not sure of him.
Mar. (with elbows on table and head on hands). I'm fearful of the
odds against him—the chances that the others have and he hasn't.
Peter's to work for his living. They're free to study all day long.
(Rising, enthusiastically.) Oh, if he does it, what a triumph for our
class. Peter Garside, the Board School boy, the working engineer,
keeping himself and you, and studying at night for his degree.
Mar. No. I've seen him work. I've worked with him till he
distanced me and left me far behind. He knows enough to pass, to
pass above them all——
Mrs. G. Not to Peter. He's fighting for his class, he's showing them
he's the better man. He can work with his hands and they can't, and
he can work with his brain as well as the best of them.
Mar. He'll do it. It may not be this time, but he'll do it in the end.
Mrs. G. (going to him; he puts his arm round her and pats her
back, while she hides her face against his chest). My boy, my boy!
Peter. I've done it, mother. (Looking proudly at Margaret.) I'm an
honours man of Midlandton University.
Peter (hanging his cap behind the door right, then coming back
to centre. Margaret is standing on the hearthrug). Ah, little mother,
what a help that faith has been to me. I couldn't disappoint a faith
like yours. I had to win. Mother, Margaret, I've done it. Done it. Oh, I
think I'm not quite sane to-night. This room seems small all of a
sudden. I want to leap, to dance, and I know I'd break my neck
against the ceiling if I did. Peter Garside, b.a. (Approaching
Margaret.) Margaret, tell me I deserve it. You know what it means to
me. The height of my ambition. The crown, the goal, my target
reached at last. Margaret, isn't it a great thing that I've done?
Peter (putting his arm over her shoulder). Oh, mother, mother!
But Margaret was right, if I hadn't had such luck in the papers I——
Mrs. G. (slipping from him and going to where her cape and
bonnet hang on the door right). It wasn't luck. Even Margaret said
you deserved it all.
Peter. Even Margaret! (Seeing her putting cape on.) You're not
going out, mother?
Peter. Oh, yes. It'll spread fast enough. They may know already.
Mrs. G. (turning with her hand on the centre door latch). How
could they?
Mrs. G. But you haven't told anyone else. Have you, Peter?
(Reproachfully.) You said you'd let me be the first to know.
Mrs. G. (opening door). And I'll tell the women. They're going to
know the kind of son I've borne. I'm a proud woman this night, and
all Belinda Street is going to know I've cause to be. (Sniffing.)
O'Callagan indeed!
Mar. (looking up and holding out her hand across table; she takes
his, bending). Oh, my dear, my dear.
Mar. Pleased!
Peter. Yes. It's a reality to day. I've done the task you set me.
I've proved my class as good as theirs. That's what you wanted,
wasn't it?
Peter. I've won because you wanted it, because after I won I
knew that you—— (Rising.) Has it been wearisome to wait,
Margaret? I had the work, lectures, study. You had the tedious clays
of teaching idiotic middle-class facts to idiotic middle-class children,
and evenings when you ought to have had me and didn't because I
couldn't lose a single precious moment's chance of study.
Peter. And the labourer is worthy of his hire? I ask for my reward.
Mar. (shaking her head). I can give you no reward that's big
enough.
Peter. You can give the greatest prize on earth. We ought to have
been married long ago. I've kept you waiting.
Mar. That had to be. They won't have married women teachers at
the Midlandton High School. I couldn't burden you until this fight
was fought.
Peter. I did it for you. But I mean to enjoy the fruits of all this
work. Public speaking's always been a joy to me. You don't know the
glorious sensation of holding a crowd in the hollow of your hand,
mastering it, doing what you like with it.
Peter. And that was why you urged this study on me?
Mar. Yes.
Mar. I've seen men ruined by this itch to speak. You know them.
Men we had great hopes of in the movement. Men we thought
would be real leaders of the people. And they spoke, and spoke, and
soon said all they had to say, became mere windbags trading on a
reputation till people tired and turned to some new orator. Don't be
one of these, Peter. You've solider grit than they. The itch to speak is
like the itch to drink, except that it's cheaper to talk yourself tipsy.
Mar. (sitting right) What shall I see of you if you're out speaking
every night? You pitied me just now because you had to close your
door against me while you studied. I could bear that for the time.
But this other thing, married and widowed at once, with you out at
your work all day and away night after night——
Peter. So do they. They'll not sack me. I might sack them some
day.
Peter (impatiently). Oh, not yet. I'm speaking of the future. Don't
you see? I'm not content to be a workman all my life. I ought to
make a living easily by writing and—and speaking if you'll let me.
Then I could be with you all day long.
Mar. (looking straight in front of her). Have I set fire to this train?
Mar. Oh, dear! This wasn't my idea at all. I wanted you to win
your degree for the honour of the thing, to show them what a
working engineer could do. Cease to be a workman and you confess
another, worse motive. It's as though you only passed to make a
profit for yourself.
Peter. I can't help being ambitious. I wasn't till you set me on.
Peter (pushing his chair hack and rising). I might have a career.
(Crossing to fireplace.)
ebookbell.com