Parallel Scientific Computation: A Structured Approach Using BSP 2nd Edition Rob H. Bisselinginstant download
Parallel Scientific Computation: A Structured Approach Using BSP 2nd Edition Rob H. Bisselinginstant download
https://ebookmeta.com/product/parallel-scientific-computation-a-
structured-approach-using-bsp-2nd-edition-rob-h-bisseling/
https://ebookmeta.com/product/numerical-analysis-and-scientific-
computation-2nd-edition-jeffery-j-leader/
https://ebookmeta.com/product/engineering-computation-an-
introduction-using-matlab-and-exce-2nd-edition-musto/
https://ebookmeta.com/product/data-parallel-c-programming-
accelerated-systems-using-c-and-sycl-2nd-edition-james-reinders/
https://ebookmeta.com/product/zack-sindicate-towers-1st-edition-
imani-jay/
The Origins of Ancient Vietnam 1st Edition Nam C Kim
https://ebookmeta.com/product/the-origins-of-ancient-vietnam-1st-
edition-nam-c-kim/
https://ebookmeta.com/product/introduction-to-water-engineering-
hydrology-and-irrigation-1st-edition-mohammad-albaji/
https://ebookmeta.com/product/case-studies-in-advanced-
engineering-design-proceedings-of-the-1st-international-
symposium-1st-edition-c-spitas/
https://ebookmeta.com/product/storm-ashes-embers-1-1st-edition-
carian-cole/
https://ebookmeta.com/product/the-moral-work-of-anthropology-
ethnographic-studies-of-anthropologists-at-work-1st-edition-
hanne-overgaard-mogensen/
The Pledge 1st Edition Cale Dietrich
https://ebookmeta.com/product/the-pledge-1st-edition-cale-
dietrich-2/
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi
second edition
ROB H. BISSELING
Utrecht University
1
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi
3
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Rob H. Bisseling 2020
The moral rights of the author have been asserted
First Edition published in 2004
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2020932793
ISBN 978–0–19–878834–8 (hbk.)
ISBN 978–0–19–878835–5 (pbk.)
DOI: 10.1093/oso/9780198788348.001.0001
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi
P R E FA CE
Why this book on parallel scientific computation? The first edition of this book appeared
in 2004, and the time was ripe for learning how to design and analyse portable parallel
algorithms and to write portable parallel programs. Parallel computing became a lot easier
and there was no need anymore to suffer from highly esoteric hardware or software systems.
The second edition is motivated by the multicore revolution that took place around 2007.
Suddenly, CPU clock speeds stopped increasing and, after a while, we found our laptop
and desktop computers equipped with several processor cores. Great news for parallelism:
parallel computers are everywhere. My yearly poll of the students in my parallel algorithms
class reveals: the best equipped student has 12 cores in his personal computer (class of
2018). This is an opportunity for teaching, since students can now run parallel programs
on their own computer, and I exploit this in the second edition. At the other end of the
spectrum, massively parallel supercomputers now reach up to millions of cores.
The continuing growth in computing power due to increased parallelism and the avail-
ability of huge amounts of data has led to the emergence of Big Data as a new application
area. Often, the data is irregular and possesses a network structure, which is mathematically
modelled by a graph. This motivated the inclusion of a whole new chapter on graph
matching in the second edition.
Today’s state of affairs in parallel programming is that one can theoretically develop a
parallel algorithm, analyse its performance on various architectures, implement the algo-
rithm, and run the resulting program with high and predictable efficiency on a shared-
memory personal computer or departmental compute server. Unfortunately, to run the
same program efficiently on a massively parallel computer, with both shared and distributed
memory, requires a lot more work these days. Programmers often employ two systems such
as MPI+X, where X stands for either OpenMP or POSIX threads, or even three systems
such as MPI+X+Y, where Y could be CUDA or OpenCL, in case graphics processing units
(GPUs) are involved as well. Often, these systems themselves are large and unwieldy. This
complicates matters, and our aim in this book is to simplify parallel programming again, by
providing a single system that unifies the shared and distributed memory systems. With this
book, I hope to convince you that parallel programming is not much harder than sequential
programming and need not be left only to the experts. GPUs are outside the scope of this
book, as their architecture is very specific and hard to capture in a single system together
with the other two types of parallelism. Still, many lessons learned from this book directly
apply to GPUs and other manycore accelerators.
Improvements in parallel hardware and software, together with major advances in the
theory of parallel programming led to the first edition of this book. An important theoretical
development was the advent of the Bulk Synchronous Parallel (BSP) programming model
proposed by Valiant in 1989 [289, 290], which provides a useful and elegant theoretical
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi
vi | preface
framework for bridging the gap between parallel hardware and software. For this reason, I
have adopted the BSP model as the target model for my parallel algorithms. In my experi-
ence, the simplicity of the BSP model makes it suitable for teaching parallel algorithms: the
model itself is easy to explain and it provides a powerful tool for expressing and analysing
algorithms.
For massively parallel supercomputers with p = 106 processors (i.e., cores), the total
number p2 of interacting processor pairs becomes huge and as a consequence we may have
to rethink our algorithms to make them scalable up to such high values of p. To address this
scalability problem, we need to model the architecture as a hierarchy consisting of many
nodes, each with many cores. Our approach is to use a flat BSP model as the basis of this
book, but to extend it by an additional level, where needed, to represent a hybrid abstract
architecture with two levels, consisting of nodes and cores. This approach is simple enough
to still benefit from the flat BSP model, but realistic enough also in postrevolutionary
times. Modelling a thousand nodes, each with a thousand cores, in a two-level model, is
far more realistic than modelling one million processors in a single-level model. For this
purpose, hierarchical variants of the BSP model have emerged, including multi-BSP [292]
by Valiant himself, decomposable BSP [81], and the simple two-level variant proposed in
this book, which I call hybrid-BSP. In practice, one hardly ever obtains access to a whole
supercomputer, but only to a smaller part, so that two levels of hierarchy usually suffice, and
the number of interacting node pairs stays within reasonable bounds.
An important goal in designing parallel algorithms is to obtain a good algorithmic
structure. One way of achieving this is by designing an algorithm as a sequence of large
steps, called supersteps in BSP language, each containing many basic computation or
communication operations and a global synchronization at the end, where all processors
wait for each other to finish their work before they proceed to the next superstep. Within a
superstep, the work is done in parallel, but the global structure of the algorithm is sequential.
This simple structure has proven its worth in practice in many parallel applications, within
the BSP world and beyond.
Many efficient implementations of the BSP model are publicly available. The first imple-
mentation that could be run on many different parallel computers, the Oxford BSP library,
was developed by Miller and Reed [223, 224] in 1993. The BSP Worldwide organization
was founded in 1995 to promote collaboration between developers and users of BSP. It is a
loose organization that mainly runs a mailing list and a website http://www.bsp-worldwide.
org where one can find pointers to BSP software and BSP research groups. One of the
goals of BSP Worldwide was to provide a standard library for BSP programming. After
extensive discussions in the BSP community, a standard called BSPlib [153] was proposed
in May 1997 and an implementation by Hill and coworkers [151], the Oxford BSP toolset,
was made available in the public domain. The standard was modernized by Yzelman and
coworkers (including the author) in 2014 [318], and a shared-memory implementation was
released as MulticoreBSP for C. The programs in this book make use of the modernized
standard. I suggest you install the latest version of MulticoreBSP for C on your laptop or
desktop computer if you want to run the programs of this book; version 2.0.4 was released
in March 2019.
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi
preface | vii
I wrote this book for students and researchers who are interested in scientific
computation. The book has a dual purpose: first, it is a textbook for a graduate course
on parallel scientific computation. The material of the book is suitable for a one-semester
course at a mathematics or computer science department. I tested all the material in class
at Utrecht University during the period 1993–2019, in an introductory course on parallel
scientific computation given every year (by now, 27 times), called ‘Parallel Algorithms’,
see the course page https://www.staff.science.uu.nl/~bisse101/Education/PA/pa.html.
The course is taken by students from mathematics, computer science, physics, and, more
recently, also artificial intelligence. Second, the book is a source of example parallel
algorithms and programs for computational scientists from many different disciplines
who are eager to get a quick start in parallel computing and want to learn a structured
approach to writing parallel programs. Prerequisites are knowledge about linear algebra
and sequential programming in a modern language such as C, C++, Java, or Python. The
program texts assume basic knowledge of the programming language C, but this should not
scare anyone away.
The scope of this book is the area of scientific computation in a broad sense. The book
treats numerical scientific computation by presenting a detailed study of several impor-
tant numerical problems. Through these problems, techniques are taught for designing
and implementing efficient, well-structured parallel algorithms. I selected these particular
problems because they are important for applications and because they give rise to a variety
of important parallelization techniques. This book treats well-known subjects such as dense
LU decomposition, fast Fourier transform (FFT), and sparse matrix–vector multiplication.
The second edition also includes several nonnumerical computations such as sorting and
graph matching, which are important in computer science.
Since this book should serve as a textbook, it covers a limited but carefully chosen amount
of material; I did not strive for completeness in covering the area of scientific computation.
A vast number of sequential algorithms can be found in Matrix Computations by Golub and
Van Loan [124] and Numerical Recipes: The Art of Scientific Computing by Press, Teukolsky,
Vetterling, and Flannery [253]. In my course on parallel algorithms, I have the habit of
assigning sequential algorithms from these books to my students and asking them to develop
parallel versions. Often, the students go out and perform an excellent job. Some of these
assignments became exercises in the present book.
The organization of the book is as follows. Chapter 1 introduces the BSP model and
BSPlib, and as an example it presents a simple complete parallel program for inner product
computation. This two-page program alone already teaches half the primitives of BSPlib.
The chapter then continues to teach the remaining important primitives in an example on
sorting. The first chapter is a concise and self-contained tutorial, which tells you how to
get started with writing BSP programs, and how to benchmark your computer as a BSP
computer.
Chapters 2–5 present parallel algorithms for problems with increasing irregularity.
Chapter 2 on dense LU decomposition presents a regular computation with communication
patterns that are common in matrix computations. It also treats optimization to achieve
high performance close to the peak performance of the parallel computer. Chapter 3 on
the FFT also treats a regular computation but one with a more complex flow of data.
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi
viii | preface
Ch. 2 Ch. 4
Ch. 1 Ch. 3
Ch. 5
preface | ix
x | preface
Other software available from my software site is the Mondriaan sparse matrix partitioning
package [304], which is used extensively in Chapter 4. This is actual production software,
available under the GNU Lesser General Public License.
To use BSPedupack, a modernized BSPlib implementation such as MulticoreBSP for
C [318] must have been installed on your shared-memory parallel computer. The programs
of this book have been tested extensively for MulticoreBSP for C. As an alternative, you
can use C++ versions of several BSPedupack programs written for the Bulk library [52],
which are available from https://jwbuurlage.github.com/Bulk. Currently, all programs of
Chapters 1–3 are available in C++/Bulk. Bulk can handle hybrid-BSP programs. Finally,
another possibility is to use BSPonMPI [281], which provides the distributed-memory
portability offered by MPI also to BSP programmers.
Another communication library, which can be used for writing parallel programs and
which has had a major impact on the field of parallel computing is the Message-Passing
Interface (MPI), formulated in 1994 as MPI-1, and extended in 1997 by MPI-2 and in
2012 by MPI-3. The MPI standard has promoted the portability of parallel programs
and one might say that its advent has effectively ended the era of architecture-dependent
communication interfaces. We presented MPI versions of the BSPedupack programs in the
first edition of this book, showing that it is possible to program in BSP style while using MPI,
but we have removed these MPI programs to save space and also because the MPI standard
has expanded and a brief treatment would not do justice to the many new developments.
Still, we include a more extensive discussion of MPI in the bibliographic notes of Chapter 1.
The programming language used in this book is C99, the 1999 ISO standard for C. The
reason for this choice is that historically many students learned C as their first programming
language (nowadays, this is Python) and that efficient C compilers are available for many
different sequential and parallel computers. On massively parallel supercomputers, the
languages used are mainly C, C++, and Fortran.
Finally, let me express my hope and expectation that this book will transform your
barriers: your own activation barrier for parallel programming will disappear; instead,
synchronization barriers will appear in your parallel programs and you will know how to
use them as an effective way of designing well-structured parallel programs.
Rob H. Bisseling
Utrecht University
December 2019
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi
A CK N O W L E D G E M E N TS
Many people helped me shape this book, from the inception of the first edition in 1993 until
the completion of the second edition in 2019. Here, I would like to express my gratitude to
all of them. I apologize if I forgot anyone.
First of all, I would like to thank Bill McColl for introducing the BSP model to me in 1992
and convincing me to abandon my previous habit of developing special-purpose algorithms
for mesh-based parallel computers. Thanks to him, I turned to designing general-purpose
algorithms that can run on every parallel computer. Without Bill’s encouragement, I would
not have written this book. Also for the second edition, Bill’s strong stands on BSP have had
an impact.
Special mention should be made of Jon Hill, co-designer and main implementor of
BSPlib. The BSPlib standard gives the programs in the main text of this book a solid
foundation. Many discussions with Jon, in particular during the course on BSP we gave
together in Jerusalem in 1997, were extremely helpful in writing this book. Sadly, Jon passed
away at a far too young age.
Several visits abroad gave me feedback and exposure to constructive criticism. For visits
during the writing of the first edition, I would like to thank my hosts Michael Berman
of Silicon Graphics Biomedical in Jerusalem, Richard Brent and Bill McColl of Oxford
University, Iain Duff of CERFACS in Toulouse, Jacko Koster and Fredrik Manne of the
University of Bergen, Satish Rao of NEC Research at Princeton, Pilar de la Torre of the
University of New Hampshire, and Leslie Valiant of Harvard University, inventor of the
BSP model. I appreciate their hospitality. I also thank the Engineering and Physical Sciences
Research Council in the UK for funding my stay in Oxford in 2000, which enabled me to
make much progress with the first edition of this book.
For visits after the publication of the first edition, I am grateful to colleagues who invited
me to give talks on the book and who provided me feedback that eventually led to this
expanded and updated second edition. I would like to thank Erik Boman, Karen Devine,
and Bruce Hendrickson of Sandia National Laboratories for their hospitality in 2004 and
for inspiring me to improve my software engineering skills. Alex Pothen has been a generous
host at Old Dominion University; I had many fruitful discussions with him on combinatorial
scientific computing. Fredrik Manne introduced me to the topic of parallel graph matching,
and our joint work from 2007 became the basis for the algorithms of Chapter 5. Fredrik also
tested Chapter 1 of the new edition in class at the Computer Science department in Bergen.
Luc Giraud invited me to CERFACS in Toulouse in 2010, where I could work on a new
release of the Mondriaan package, discussed in Chapter 4.
By 2010, I had sufficiently recovered from the efforts of writing a first edition to imagine
writing a new edition. Visits to KU Leuven in the years 2012–14, hosted by Albert-Jan
Yzelman and Dirk Roose, helped me define the desired content of the new edition, in
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi
xii | acknowledgements
particular the addition of parallel sorting and graph algorithms. I thank Albert-Jan and
Dirk for their feedback on the use of the first edition in their course in Leuven. A guest
professorship at the University of Orléans in May 2014 became the start of the actual
writing of the new edition. I thank Hélène Coullon, Sébastien Limet, Frédéric Loulergue,
and Sophie Robert for their great hospitality. Assaf Schuster hosted numerous visits to
the Technion in Haifa, where parts of the first edition were used as course material. A
visit to Sherry Li at Lawrence Berkeley National Laboratory in 2016 provided helpful new
insights into high-performance LU decomposition. Sherry also helped me obtain access to
the Cori supercomputer, which I happily used in the numerical experiments of Chapter 2;
this chapter used resources of Cori at the National Energy Research Scientific Computing
Center (NERSC), a US Department of Energy, Office of Science, User Facility operated
under Contract No. DE-AC02-05CH11231.
Finally, visits in 2018 to Sivan Toledo at Tel Aviv University and Oded Schwartz at the
Hebrew University of Jerusalem helped me in the final stages of this second edition. Looking
back on all these travels, I realize that this book could not have been written on a desert
island, however beautiful and isolated it might have been.
The BSPedupack software accompanying this book has been tested on many different
parallel computers, of different generations for the two editions. I thank all the people from
the supercomputer centres who have assisted me and the institutes and companies that gave
me access and computer time. For the first edition, I would like to thank Jeremy Martin
and Bob McLatchie of Oxford University, the Oxford Supercomputer Centre, and Sychron
Ltd in Oxford. In the Netherlands, I gained assistance from Jana Vasiljev, Willem Vermin,
Aad van der Steen, and for the second edition from Valeriu Codreanu. For both editions, I
benefited from yearly grants of computer time from NCF (National Computer Facilities),
now part of the Dutch Science Organization NWO. My students and I enjoyed all the work
we could do on generations of national supercomputers, from Vermeer in Delft to Teras,
Aster, Huygens, and Cartesius at SURFsara in Amsterdam. The final runs in the second
edition on Cartesius were carried out under grant SH-349-15 from NWO.
The Mathematics Institute of Utrecht University provided me with a stimulating envi-
ronment for writing this book. Henk van der Vorst started the Computational Science
education programme in Utrecht in 1993. Later it became the Scientific Computing pro-
gramme, which gave me the opportunity to develop and test the material of the book. Since
2012, I have also regularly spent time at CWI in Amsterdam, where I participated in the
computational life sciences group led by Gunnar Klau and the scientific computing group
led by Daan Crommelin, and where I currently participate in the computational imaging
group led by Joost Batenburg. I thank all of my colleagues at CWI for many inspiring ideas
and for providing a sanctuary where I could work steadily on the second edition.
Over the years, hundreds of students have graduated from my parallel scientific computa-
tion courses. Many have contributed, occasionally by being guinea-pigs (my apologies!), but
more often as partners in a genuine dialogue. These students have helped me to improve the
exposition of the material and they have forced me to be as clear and brief as I can (except in
this acknowledgement). I thank all of them, with special mention of Tammo Jan Dijkema,
Stefan Korenberg, Maarten Löffler, Angelo Mekenkamp, and Katharina Klein, who were
champion proofreaders, and Jan-Willem Buurlage, Mick van Duijn, Mitchell Faas, Wijnand
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi
acknowledgements | xiii
Suijlen, Dik Takken, Paul Visscher, Abe Wits, and Albert-Jan Yzelman, who developed their
own BSP libraries while taking my course, or soon afterwards.
During the period of writing this book, I was joined in my parallel computing research
by MSc students, PhD students, and postdocs. Discussions with them often yielded new
insights, and I enjoyed many working and off-working hours spent together. For the first
edition, I would like to mention here: Márcia Alves de Inda, Ildikó Flesch, Jeroen van
Grondelle, Alexander van Heukelum, Neal Hegeman, Guy Horvitz, Frank van Lingen,
Jacko Koster, Joris Koster, Wouter Meesen, Bruce Stephens, Frank van der Stappen, Mark
Stijnman, Dik Takken, Patrick Timmers, and Brendan Vastenhouw. For the second edition,
I would like to mention: Sarita de Berg, Folkert Bleichrodt, Jan-Willem Buurlage, Bas
Fagginger Auer, Timon Knigge, Tristan van Leeuwen, Marco van Oort, Daniël Pelt, Raoul
Schram, Davide Taviani, Nick Verheul, and Albert-Jan Yzelman. A comment by Raoul
Schram led to the faster bit-reversal algorithm in the second edition, Algorithm 3.3. An
industrial contribution came from Pascal Ramaekers, Daniel Sevilla Sánchez, and Hans van
der Voort from Scientific Volume Imaging in Hilversum, the Netherlands, who posed a real-
life parallelization problem for a 3D fluorescence microscopy application to the students in
my Mathematics for Industry course in 2018, which led to Exercise 4.6.
Much of my pre-BSP work has contributed to this book as well. In particular, research I
carried out in 1987–93 at the Koninklijke/Shell-Laboratory in Amsterdam has taught me
much about parallel computing and sparse matrix computations. Ideas from the prototype
library PARPACK, developed in those years at Shell, profoundly influenced the present
work. The importance of a structured approach was already apparent then; good structure
was obtained in PARPACK by writing programs with communication-closed layers, the
predecessor of the BSP superstep. I would like to express my debt to the enlightened
management by Arie Langeveld and Theo Verheggen at Shell and to my close colleagues
Daniël Loyens and Hans van de Vorst from the parallel computing group.
Going back even further, to my years 1981–6 at the Hebrew University of Jerusalem,
my PhD supervisor Ronnie Kosloff aroused my interest in fast Fourier transforms, which
has become the subject of Chapter 3. Ronnie seriously influenced my way of working,
by injecting me with a large dose of (quantum molecular) dynamics. In Jerusalem, Larry
Rudolph introduced me to the field of parallel computing. His enthusiasm and juggling acts
left an imprint forever.
Comments on draft chapters of the first edition have been given by Márcia Alves de Inda,
Richard Brent, Olivier Dulieu, Jon Hill, Slava Kokoouline, Jacko Koster, Frank van Lingen,
Ronald Meester, Adina Milston, John Reid, Dan Stefanescu, Pilar de la Torre, Leslie Valiant,
and Yael Weinbach. Comments for the second edition have been given by Fatima Abu Salem,
Ariful Azad, Sarai Bisseling, Aydın Buluç, Jan-Willem Buurlage, Mick van Duijn, Robert
van de Geijn, Fredrik Manne, Joshua Maynard, Daniël Pelt, Wijnand Suijlen, and Albert-
Jan Yzelman. Aesthetic advice for the first edition was given by Ron Bertels, Lidy Bisseling,
Gerda Dekker, and Gila and Joel Kantor. Thanks to all of them. Disclaimer: if you find typing
errors, small flaws, serious flaws, unintended Dutch, or worse, do not blame them, just flame
me! All comments are welcome at: r.h.bisseling@uu.nl. I thank my editors of the
first edition at Oxford University Press, Elizabeth Johnston, Alison Jones, and Mahua Nandi,
for accepting my vision of this book and for their ideas, good judgement, help, and patience.
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi
xiv | acknowledgements
I thank my editors for the second edition, Keith Mansfield, Dan Taber, and Katherine Ward,
for their enthusiasm, their infinite patience, and their suggestions which made this book
project a finite effort.
Finally, in the writing of this book, I owe much to my family. My wife Rona showed
love and sympathy, and gave support whenever needed. Our daughter Sarai, born in 1994,
had already acquired quite some mathematical and computer skills when I completed
the manuscript of the first edition in 2003. At the time, I tested a few exercises on her
(admittedly, unmarked ones), and I was amazed how much a nine-year-old can understand
about parallel computing. If she could do it then, you can now. Today, she is a skilled
programmer herself, and she encounters parallel computing everywhere. Without knowing
it, I wrote the book for her.
Other documents randomly have
different content
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.