100% found this document useful (2 votes)
8 views

Parallel Scientific Computation: A Structured Approach Using BSP 2nd Edition Rob H. Bisselinginstant download

The document discusses the second edition of 'Parallel Scientific Computation: A Structured Approach Using BSP' by Rob H. Bisseling, which focuses on teaching parallel algorithms and programming using the Bulk Synchronous Parallel (BSP) model. It highlights the evolution of parallel computing due to multicore processors and the emergence of Big Data, and introduces new topics such as graph matching. The book serves as both a textbook for graduate courses and a resource for computational scientists, covering essential numerical problems and parallelization techniques.

Uploaded by

tintinbanaan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
8 views

Parallel Scientific Computation: A Structured Approach Using BSP 2nd Edition Rob H. Bisselinginstant download

The document discusses the second edition of 'Parallel Scientific Computation: A Structured Approach Using BSP' by Rob H. Bisseling, which focuses on teaching parallel algorithms and programming using the Bulk Synchronous Parallel (BSP) model. It highlights the evolution of parallel computing due to multicore processors and the emergence of Big Data, and introduces new topics such as graph matching. The book serves as both a textbook for graduate courses and a resource for computational scientists, covering essential numerical problems and parallelization techniques.

Uploaded by

tintinbanaan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Parallel Scientific Computation: A Structured

Approach Using BSP 2nd Edition Rob H. Bisseling


pdf download

https://ebookmeta.com/product/parallel-scientific-computation-a-
structured-approach-using-bsp-2nd-edition-rob-h-bisseling/

Download more ebook from https://ebookmeta.com


We believe these products will be a great fit for you. Click
the link to download now, or visit ebookmeta.com
to discover even more!

Numerical Analysis and Scientific Computation 2nd


Edition Jeffery J. Leader

https://ebookmeta.com/product/numerical-analysis-and-scientific-
computation-2nd-edition-jeffery-j-leader/

Engineering Computation: An Introduction Using MATLAB


and Exce, 2nd Edition Musto

https://ebookmeta.com/product/engineering-computation-an-
introduction-using-matlab-and-exce-2nd-edition-musto/

Data Parallel C++: Programming Accelerated Systems


Using C++ and SYCL 2nd Edition James Reinders

https://ebookmeta.com/product/data-parallel-c-programming-
accelerated-systems-using-c-and-sycl-2nd-edition-james-reinders/

Zack Sindicate Towers 1st Edition Imani Jay

https://ebookmeta.com/product/zack-sindicate-towers-1st-edition-
imani-jay/
The Origins of Ancient Vietnam 1st Edition Nam C Kim

https://ebookmeta.com/product/the-origins-of-ancient-vietnam-1st-
edition-nam-c-kim/

Introduction to Water Engineering, Hydrology, and


Irrigation 1st Edition Mohammad Albaji

https://ebookmeta.com/product/introduction-to-water-engineering-
hydrology-and-irrigation-1st-edition-mohammad-albaji/

Case Studies in Advanced Engineering Design Proceedings


of the 1st International Symposium 1st Edition C.
Spitas

https://ebookmeta.com/product/case-studies-in-advanced-
engineering-design-proceedings-of-the-1st-international-
symposium-1st-edition-c-spitas/

Storm Ashes Embers 1 1st Edition Carian Cole

https://ebookmeta.com/product/storm-ashes-embers-1-1st-edition-
carian-cole/

The Moral Work of Anthropology Ethnographic Studies of


Anthropologists at Work 1st Edition Hanne Overgaard
Mogensen

https://ebookmeta.com/product/the-moral-work-of-anthropology-
ethnographic-studies-of-anthropologists-at-work-1st-edition-
hanne-overgaard-mogensen/
The Pledge 1st Edition Cale Dietrich

https://ebookmeta.com/product/the-pledge-1st-edition-cale-
dietrich-2/
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi

parallel scientific computation


OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi

Parallel Scientific Computation


A Structured Approach Using BSP

second edition

ROB H. BISSELING
Utrecht University

1
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi

3
Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Rob H. Bisseling 2020
The moral rights of the author have been asserted
First Edition published in 2004
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics
rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press
198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2020932793
ISBN 978–0–19–878834–8 (hbk.)
ISBN 978–0–19–878835–5 (pbk.)
DOI: 10.1093/oso/9780198788348.001.0001
Printed and bound by
CPI Group (UK) Ltd, Croydon, CR0 4YY
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi

P R E FA CE

Why this book on parallel scientific computation? The first edition of this book appeared
in 2004, and the time was ripe for learning how to design and analyse portable parallel
algorithms and to write portable parallel programs. Parallel computing became a lot easier
and there was no need anymore to suffer from highly esoteric hardware or software systems.
The second edition is motivated by the multicore revolution that took place around 2007.
Suddenly, CPU clock speeds stopped increasing and, after a while, we found our laptop
and desktop computers equipped with several processor cores. Great news for parallelism:
parallel computers are everywhere. My yearly poll of the students in my parallel algorithms
class reveals: the best equipped student has 12 cores in his personal computer (class of
2018). This is an opportunity for teaching, since students can now run parallel programs
on their own computer, and I exploit this in the second edition. At the other end of the
spectrum, massively parallel supercomputers now reach up to millions of cores.
The continuing growth in computing power due to increased parallelism and the avail-
ability of huge amounts of data has led to the emergence of Big Data as a new application
area. Often, the data is irregular and possesses a network structure, which is mathematically
modelled by a graph. This motivated the inclusion of a whole new chapter on graph
matching in the second edition.
Today’s state of affairs in parallel programming is that one can theoretically develop a
parallel algorithm, analyse its performance on various architectures, implement the algo-
rithm, and run the resulting program with high and predictable efficiency on a shared-
memory personal computer or departmental compute server. Unfortunately, to run the
same program efficiently on a massively parallel computer, with both shared and distributed
memory, requires a lot more work these days. Programmers often employ two systems such
as MPI+X, where X stands for either OpenMP or POSIX threads, or even three systems
such as MPI+X+Y, where Y could be CUDA or OpenCL, in case graphics processing units
(GPUs) are involved as well. Often, these systems themselves are large and unwieldy. This
complicates matters, and our aim in this book is to simplify parallel programming again, by
providing a single system that unifies the shared and distributed memory systems. With this
book, I hope to convince you that parallel programming is not much harder than sequential
programming and need not be left only to the experts. GPUs are outside the scope of this
book, as their architecture is very specific and hard to capture in a single system together
with the other two types of parallelism. Still, many lessons learned from this book directly
apply to GPUs and other manycore accelerators.
Improvements in parallel hardware and software, together with major advances in the
theory of parallel programming led to the first edition of this book. An important theoretical
development was the advent of the Bulk Synchronous Parallel (BSP) programming model
proposed by Valiant in 1989 [289, 290], which provides a useful and elegant theoretical
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi

vi | preface

framework for bridging the gap between parallel hardware and software. For this reason, I
have adopted the BSP model as the target model for my parallel algorithms. In my experi-
ence, the simplicity of the BSP model makes it suitable for teaching parallel algorithms: the
model itself is easy to explain and it provides a powerful tool for expressing and analysing
algorithms.
For massively parallel supercomputers with p = 106 processors (i.e., cores), the total
number p2 of interacting processor pairs becomes huge and as a consequence we may have
to rethink our algorithms to make them scalable up to such high values of p. To address this
scalability problem, we need to model the architecture as a hierarchy consisting of many
nodes, each with many cores. Our approach is to use a flat BSP model as the basis of this
book, but to extend it by an additional level, where needed, to represent a hybrid abstract
architecture with two levels, consisting of nodes and cores. This approach is simple enough
to still benefit from the flat BSP model, but realistic enough also in postrevolutionary
times. Modelling a thousand nodes, each with a thousand cores, in a two-level model, is
far more realistic than modelling one million processors in a single-level model. For this
purpose, hierarchical variants of the BSP model have emerged, including multi-BSP [292]
by Valiant himself, decomposable BSP [81], and the simple two-level variant proposed in
this book, which I call hybrid-BSP. In practice, one hardly ever obtains access to a whole
supercomputer, but only to a smaller part, so that two levels of hierarchy usually suffice, and
the number of interacting node pairs stays within reasonable bounds.
An important goal in designing parallel algorithms is to obtain a good algorithmic
structure. One way of achieving this is by designing an algorithm as a sequence of large
steps, called supersteps in BSP language, each containing many basic computation or
communication operations and a global synchronization at the end, where all processors
wait for each other to finish their work before they proceed to the next superstep. Within a
superstep, the work is done in parallel, but the global structure of the algorithm is sequential.
This simple structure has proven its worth in practice in many parallel applications, within
the BSP world and beyond.
Many efficient implementations of the BSP model are publicly available. The first imple-
mentation that could be run on many different parallel computers, the Oxford BSP library,
was developed by Miller and Reed [223, 224] in 1993. The BSP Worldwide organization
was founded in 1995 to promote collaboration between developers and users of BSP. It is a
loose organization that mainly runs a mailing list and a website http://www.bsp-worldwide.
org where one can find pointers to BSP software and BSP research groups. One of the
goals of BSP Worldwide was to provide a standard library for BSP programming. After
extensive discussions in the BSP community, a standard called BSPlib [153] was proposed
in May 1997 and an implementation by Hill and coworkers [151], the Oxford BSP toolset,
was made available in the public domain. The standard was modernized by Yzelman and
coworkers (including the author) in 2014 [318], and a shared-memory implementation was
released as MulticoreBSP for C. The programs in this book make use of the modernized
standard. I suggest you install the latest version of MulticoreBSP for C on your laptop or
desktop computer if you want to run the programs of this book; version 2.0.4 was released
in March 2019.
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi

preface | vii

I wrote this book for students and researchers who are interested in scientific
computation. The book has a dual purpose: first, it is a textbook for a graduate course
on parallel scientific computation. The material of the book is suitable for a one-semester
course at a mathematics or computer science department. I tested all the material in class
at Utrecht University during the period 1993–2019, in an introductory course on parallel
scientific computation given every year (by now, 27 times), called ‘Parallel Algorithms’,
see the course page https://www.staff.science.uu.nl/~bisse101/Education/PA/pa.html.
The course is taken by students from mathematics, computer science, physics, and, more
recently, also artificial intelligence. Second, the book is a source of example parallel
algorithms and programs for computational scientists from many different disciplines
who are eager to get a quick start in parallel computing and want to learn a structured
approach to writing parallel programs. Prerequisites are knowledge about linear algebra
and sequential programming in a modern language such as C, C++, Java, or Python. The
program texts assume basic knowledge of the programming language C, but this should not
scare anyone away.
The scope of this book is the area of scientific computation in a broad sense. The book
treats numerical scientific computation by presenting a detailed study of several impor-
tant numerical problems. Through these problems, techniques are taught for designing
and implementing efficient, well-structured parallel algorithms. I selected these particular
problems because they are important for applications and because they give rise to a variety
of important parallelization techniques. This book treats well-known subjects such as dense
LU decomposition, fast Fourier transform (FFT), and sparse matrix–vector multiplication.
The second edition also includes several nonnumerical computations such as sorting and
graph matching, which are important in computer science.
Since this book should serve as a textbook, it covers a limited but carefully chosen amount
of material; I did not strive for completeness in covering the area of scientific computation.
A vast number of sequential algorithms can be found in Matrix Computations by Golub and
Van Loan [124] and Numerical Recipes: The Art of Scientific Computing by Press, Teukolsky,
Vetterling, and Flannery [253]. In my course on parallel algorithms, I have the habit of
assigning sequential algorithms from these books to my students and asking them to develop
parallel versions. Often, the students go out and perform an excellent job. Some of these
assignments became exercises in the present book.
The organization of the book is as follows. Chapter 1 introduces the BSP model and
BSPlib, and as an example it presents a simple complete parallel program for inner product
computation. This two-page program alone already teaches half the primitives of BSPlib.
The chapter then continues to teach the remaining important primitives in an example on
sorting. The first chapter is a concise and self-contained tutorial, which tells you how to
get started with writing BSP programs, and how to benchmark your computer as a BSP
computer.
Chapters 2–5 present parallel algorithms for problems with increasing irregularity.
Chapter 2 on dense LU decomposition presents a regular computation with communication
patterns that are common in matrix computations. It also treats optimization to achieve
high performance close to the peak performance of the parallel computer. Chapter 3 on
the FFT also treats a regular computation but one with a more complex flow of data.
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi

viii | preface

The execution-time requirements of the LU decomposition and FFT algorithms can


be analysed exactly and the performance of an implementation can be predicted quite
accurately. Chapter 4 presents the multiplication of a sparse matrix and a dense vector. The
computation involves only those matrix elements that are nonzero, so that in general it is
irregular. The communication involves the components of dense input and output vectors.
Although these vectors can be stored in a regular data structure, the communication pattern
becomes irregular because efficient communication must exploit the sparsity of the matrix.
This chapter also connects to the area of machine learning: employing a deep artificial
neural network can be viewed as performing many matrix–vector multiplications, one for
each hidden layer or output layer of the network. Chapter 5 presents an algorithm for
matching vertices in a sparse graph. This computation is highly irregular because the graph
is sparse and because it changes by the removal of matched vertices and their edges.
The order in which the chapters can be read is given by the following directed acyclic
graph (with chapters as vertices and prerequisites as directed edges):

Ch. 2 Ch. 4

Ch. 1 Ch. 3

Ch. 5

Major changes in the second edition compared to the first are:

• New Sections 1.8–1.10 on sorting.


• New Section 2.5 on high-performance LU decomposition.
• Simplified Section 3.5 on weights for the FFT.
• New Section 4.6 on fine-grain and medium-grain partitioning.
• New Section 4.10 on hybrid-BSP.
• New Chapter 5 on graph matching.
• Bibliographic Section 1.11.3 instead of appendix on MPI.

All sections of the book have been brought up to date.


Chapter 1 can stand on its own and serve as the basis for a short course on paral-
lel programming (3–4 ECTS points, in the European Credit Transfer System where 60
ECTS comprise a full year of study). For a full semester course of eight ECTS in Parallel
Algorithms, one could choose to treat Chapters 1, 2, and 4, and possibly Chapter 3 for a
mathematics audience, or Chapter 5 for a computer science audience.
Each chapter contains: an abstract; a brief discussion of a sequential algorithm, included
to make the material self-contained; the design and analysis of a parallel algorithm; ideas
for possible optimization; an annotated program text; illustrative experimental results of
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi

preface | ix

an implementation on a particular parallel computer; bibliographic notes, giving historical


background and pointers for further reading; theoretical and practical exercises.
My approach in presenting algorithms and program texts has been to give priority to
clarity, simplicity, and brevity, even if this comes at the expense of a slight decrease in
efficiency. In this book, algorithms and programs are only optimized if this teaches an
important technique, or improves efficiency by an order of magnitude, or if this can be done
without much harm to clarity. Hints for further optimization are given in exercises. The
reader should view the programs as a starting point for achieving fast implementations.
One goal of this book is to ease the transition from theory to practice. For this purpose,
each chapter includes an example program, which presents a possible implementation of
the central algorithm in that chapter. The program texts form a small but integral part of
this book. They are meant to be read by humans, besides being compiled and executed
by computers. Studying the program texts is the best way of understanding what parallel
programming is really about. Using and modifying the programs gives you valuable hands-
on experience.
The aim of the section on experimental results is to illustrate the theoretical analysis.
Often, one aspect is highlighted; I made no attempt to perform an exhaustive set of exper-
iments. A real danger in trying to explain experimental results for an algorithm is that a
full explanation may lead to a discussion of nitty-gritty implementation details or hardware
quirks. This is hardly illuminating for the algorithm, and therefore I have chosen to keep
such explanations to a minimum. For my experiments, I have used several different parallel
machines, older ones as well as newer ones: parallel computers come and go quickly.
The bibliographic notes of this book are lengthier than usual in a textbook, since I have
tried to summarize the contents of the cited work and relate them to the topic discussed in
the current chapter. Often, I could not resist the temptation to write a few sentences about
a subject not fully discussed in the main text, but still worth mentioning.
Most exercises of this book have the form of programming projects, which are suitable for
use in an accompanying computer-laboratory class. I have graded the exercises according to
difficulty/amount of work involved, marking an exercise by an asterisk (∗) if it requires more
work than a basic exercise and by two asterisks (∗∗) if it requires a lot of work, meaning that
it would be suitable as a final assignment or small research project. Inevitably, such a grading
is subjective, but it may be helpful for a teacher in assigning problems to students. The main
text of the book treats a few central topics from parallel scientific computation in depth; the
exercises are meant to give the book breadth. At the end of my Parallel Algorithms course,
I ask the students to give a 15–20 minute presentation on their final project. This exposes
their fellow students to the breadth of parallel scientific computation.
The source files of the printed program texts, together with a set of test programs
that demonstrate their use, form a package called BSPedupack, available at https://www.
staff.science.uu.nl/~bisse101/Software/software. The package is copyrighted, but freely
available under the GNU General Public License, meaning that its programs can be used
and modified freely, provided the source and all modifications are mentioned, and every
modification is again made freely available under the same license. As the name says, the
programs in BSPedupack are primarily intended for teaching. They are definitely not meant
to be used as a black box. Only rudimentary error handling has been built into the programs.
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi

x | preface

Other software available from my software site is the Mondriaan sparse matrix partitioning
package [304], which is used extensively in Chapter 4. This is actual production software,
available under the GNU Lesser General Public License.
To use BSPedupack, a modernized BSPlib implementation such as MulticoreBSP for
C [318] must have been installed on your shared-memory parallel computer. The programs
of this book have been tested extensively for MulticoreBSP for C. As an alternative, you
can use C++ versions of several BSPedupack programs written for the Bulk library [52],
which are available from https://jwbuurlage.github.com/Bulk. Currently, all programs of
Chapters 1–3 are available in C++/Bulk. Bulk can handle hybrid-BSP programs. Finally,
another possibility is to use BSPonMPI [281], which provides the distributed-memory
portability offered by MPI also to BSP programmers.
Another communication library, which can be used for writing parallel programs and
which has had a major impact on the field of parallel computing is the Message-Passing
Interface (MPI), formulated in 1994 as MPI-1, and extended in 1997 by MPI-2 and in
2012 by MPI-3. The MPI standard has promoted the portability of parallel programs
and one might say that its advent has effectively ended the era of architecture-dependent
communication interfaces. We presented MPI versions of the BSPedupack programs in the
first edition of this book, showing that it is possible to program in BSP style while using MPI,
but we have removed these MPI programs to save space and also because the MPI standard
has expanded and a brief treatment would not do justice to the many new developments.
Still, we include a more extensive discussion of MPI in the bibliographic notes of Chapter 1.
The programming language used in this book is C99, the 1999 ISO standard for C. The
reason for this choice is that historically many students learned C as their first programming
language (nowadays, this is Python) and that efficient C compilers are available for many
different sequential and parallel computers. On massively parallel supercomputers, the
languages used are mainly C, C++, and Fortran.
Finally, let me express my hope and expectation that this book will transform your
barriers: your own activation barrier for parallel programming will disappear; instead,
synchronization barriers will appear in your parallel programs and you will know how to
use them as an effective way of designing well-structured parallel programs.
Rob H. Bisseling
Utrecht University
December 2019
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi

A CK N O W L E D G E M E N TS

Many people helped me shape this book, from the inception of the first edition in 1993 until
the completion of the second edition in 2019. Here, I would like to express my gratitude to
all of them. I apologize if I forgot anyone.
First of all, I would like to thank Bill McColl for introducing the BSP model to me in 1992
and convincing me to abandon my previous habit of developing special-purpose algorithms
for mesh-based parallel computers. Thanks to him, I turned to designing general-purpose
algorithms that can run on every parallel computer. Without Bill’s encouragement, I would
not have written this book. Also for the second edition, Bill’s strong stands on BSP have had
an impact.
Special mention should be made of Jon Hill, co-designer and main implementor of
BSPlib. The BSPlib standard gives the programs in the main text of this book a solid
foundation. Many discussions with Jon, in particular during the course on BSP we gave
together in Jerusalem in 1997, were extremely helpful in writing this book. Sadly, Jon passed
away at a far too young age.
Several visits abroad gave me feedback and exposure to constructive criticism. For visits
during the writing of the first edition, I would like to thank my hosts Michael Berman
of Silicon Graphics Biomedical in Jerusalem, Richard Brent and Bill McColl of Oxford
University, Iain Duff of CERFACS in Toulouse, Jacko Koster and Fredrik Manne of the
University of Bergen, Satish Rao of NEC Research at Princeton, Pilar de la Torre of the
University of New Hampshire, and Leslie Valiant of Harvard University, inventor of the
BSP model. I appreciate their hospitality. I also thank the Engineering and Physical Sciences
Research Council in the UK for funding my stay in Oxford in 2000, which enabled me to
make much progress with the first edition of this book.
For visits after the publication of the first edition, I am grateful to colleagues who invited
me to give talks on the book and who provided me feedback that eventually led to this
expanded and updated second edition. I would like to thank Erik Boman, Karen Devine,
and Bruce Hendrickson of Sandia National Laboratories for their hospitality in 2004 and
for inspiring me to improve my software engineering skills. Alex Pothen has been a generous
host at Old Dominion University; I had many fruitful discussions with him on combinatorial
scientific computing. Fredrik Manne introduced me to the topic of parallel graph matching,
and our joint work from 2007 became the basis for the algorithms of Chapter 5. Fredrik also
tested Chapter 1 of the new edition in class at the Computer Science department in Bergen.
Luc Giraud invited me to CERFACS in Toulouse in 2010, where I could work on a new
release of the Mondriaan package, discussed in Chapter 4.
By 2010, I had sufficiently recovered from the efforts of writing a first edition to imagine
writing a new edition. Visits to KU Leuven in the years 2012–14, hosted by Albert-Jan
Yzelman and Dirk Roose, helped me define the desired content of the new edition, in
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi

xii | acknowledgements

particular the addition of parallel sorting and graph algorithms. I thank Albert-Jan and
Dirk for their feedback on the use of the first edition in their course in Leuven. A guest
professorship at the University of Orléans in May 2014 became the start of the actual
writing of the new edition. I thank Hélène Coullon, Sébastien Limet, Frédéric Loulergue,
and Sophie Robert for their great hospitality. Assaf Schuster hosted numerous visits to
the Technion in Haifa, where parts of the first edition were used as course material. A
visit to Sherry Li at Lawrence Berkeley National Laboratory in 2016 provided helpful new
insights into high-performance LU decomposition. Sherry also helped me obtain access to
the Cori supercomputer, which I happily used in the numerical experiments of Chapter 2;
this chapter used resources of Cori at the National Energy Research Scientific Computing
Center (NERSC), a US Department of Energy, Office of Science, User Facility operated
under Contract No. DE-AC02-05CH11231.
Finally, visits in 2018 to Sivan Toledo at Tel Aviv University and Oded Schwartz at the
Hebrew University of Jerusalem helped me in the final stages of this second edition. Looking
back on all these travels, I realize that this book could not have been written on a desert
island, however beautiful and isolated it might have been.
The BSPedupack software accompanying this book has been tested on many different
parallel computers, of different generations for the two editions. I thank all the people from
the supercomputer centres who have assisted me and the institutes and companies that gave
me access and computer time. For the first edition, I would like to thank Jeremy Martin
and Bob McLatchie of Oxford University, the Oxford Supercomputer Centre, and Sychron
Ltd in Oxford. In the Netherlands, I gained assistance from Jana Vasiljev, Willem Vermin,
Aad van der Steen, and for the second edition from Valeriu Codreanu. For both editions, I
benefited from yearly grants of computer time from NCF (National Computer Facilities),
now part of the Dutch Science Organization NWO. My students and I enjoyed all the work
we could do on generations of national supercomputers, from Vermeer in Delft to Teras,
Aster, Huygens, and Cartesius at SURFsara in Amsterdam. The final runs in the second
edition on Cartesius were carried out under grant SH-349-15 from NWO.
The Mathematics Institute of Utrecht University provided me with a stimulating envi-
ronment for writing this book. Henk van der Vorst started the Computational Science
education programme in Utrecht in 1993. Later it became the Scientific Computing pro-
gramme, which gave me the opportunity to develop and test the material of the book. Since
2012, I have also regularly spent time at CWI in Amsterdam, where I participated in the
computational life sciences group led by Gunnar Klau and the scientific computing group
led by Daan Crommelin, and where I currently participate in the computational imaging
group led by Joost Batenburg. I thank all of my colleagues at CWI for many inspiring ideas
and for providing a sanctuary where I could work steadily on the second edition.
Over the years, hundreds of students have graduated from my parallel scientific computa-
tion courses. Many have contributed, occasionally by being guinea-pigs (my apologies!), but
more often as partners in a genuine dialogue. These students have helped me to improve the
exposition of the material and they have forced me to be as clear and brief as I can (except in
this acknowledgement). I thank all of them, with special mention of Tammo Jan Dijkema,
Stefan Korenberg, Maarten Löffler, Angelo Mekenkamp, and Katharina Klein, who were
champion proofreaders, and Jan-Willem Buurlage, Mick van Duijn, Mitchell Faas, Wijnand
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi

acknowledgements | xiii

Suijlen, Dik Takken, Paul Visscher, Abe Wits, and Albert-Jan Yzelman, who developed their
own BSP libraries while taking my course, or soon afterwards.
During the period of writing this book, I was joined in my parallel computing research
by MSc students, PhD students, and postdocs. Discussions with them often yielded new
insights, and I enjoyed many working and off-working hours spent together. For the first
edition, I would like to mention here: Márcia Alves de Inda, Ildikó Flesch, Jeroen van
Grondelle, Alexander van Heukelum, Neal Hegeman, Guy Horvitz, Frank van Lingen,
Jacko Koster, Joris Koster, Wouter Meesen, Bruce Stephens, Frank van der Stappen, Mark
Stijnman, Dik Takken, Patrick Timmers, and Brendan Vastenhouw. For the second edition,
I would like to mention: Sarita de Berg, Folkert Bleichrodt, Jan-Willem Buurlage, Bas
Fagginger Auer, Timon Knigge, Tristan van Leeuwen, Marco van Oort, Daniël Pelt, Raoul
Schram, Davide Taviani, Nick Verheul, and Albert-Jan Yzelman. A comment by Raoul
Schram led to the faster bit-reversal algorithm in the second edition, Algorithm 3.3. An
industrial contribution came from Pascal Ramaekers, Daniel Sevilla Sánchez, and Hans van
der Voort from Scientific Volume Imaging in Hilversum, the Netherlands, who posed a real-
life parallelization problem for a 3D fluorescence microscopy application to the students in
my Mathematics for Industry course in 2018, which led to Exercise 4.6.
Much of my pre-BSP work has contributed to this book as well. In particular, research I
carried out in 1987–93 at the Koninklijke/Shell-Laboratory in Amsterdam has taught me
much about parallel computing and sparse matrix computations. Ideas from the prototype
library PARPACK, developed in those years at Shell, profoundly influenced the present
work. The importance of a structured approach was already apparent then; good structure
was obtained in PARPACK by writing programs with communication-closed layers, the
predecessor of the BSP superstep. I would like to express my debt to the enlightened
management by Arie Langeveld and Theo Verheggen at Shell and to my close colleagues
Daniël Loyens and Hans van de Vorst from the parallel computing group.
Going back even further, to my years 1981–6 at the Hebrew University of Jerusalem,
my PhD supervisor Ronnie Kosloff aroused my interest in fast Fourier transforms, which
has become the subject of Chapter 3. Ronnie seriously influenced my way of working,
by injecting me with a large dose of (quantum molecular) dynamics. In Jerusalem, Larry
Rudolph introduced me to the field of parallel computing. His enthusiasm and juggling acts
left an imprint forever.
Comments on draft chapters of the first edition have been given by Márcia Alves de Inda,
Richard Brent, Olivier Dulieu, Jon Hill, Slava Kokoouline, Jacko Koster, Frank van Lingen,
Ronald Meester, Adina Milston, John Reid, Dan Stefanescu, Pilar de la Torre, Leslie Valiant,
and Yael Weinbach. Comments for the second edition have been given by Fatima Abu Salem,
Ariful Azad, Sarai Bisseling, Aydın Buluç, Jan-Willem Buurlage, Mick van Duijn, Robert
van de Geijn, Fredrik Manne, Joshua Maynard, Daniël Pelt, Wijnand Suijlen, and Albert-
Jan Yzelman. Aesthetic advice for the first edition was given by Ron Bertels, Lidy Bisseling,
Gerda Dekker, and Gila and Joel Kantor. Thanks to all of them. Disclaimer: if you find typing
errors, small flaws, serious flaws, unintended Dutch, or worse, do not blame them, just flame
me! All comments are welcome at: r.h.bisseling@uu.nl. I thank my editors of the
first edition at Oxford University Press, Elizabeth Johnston, Alison Jones, and Mahua Nandi,
for accepting my vision of this book and for their ideas, good judgement, help, and patience.
OUP CORRECTED PROOF – FINAL, 2/7/2020, SPi

xiv | acknowledgements

I thank my editors for the second edition, Keith Mansfield, Dan Taber, and Katherine Ward,
for their enthusiasm, their infinite patience, and their suggestions which made this book
project a finite effort.
Finally, in the writing of this book, I owe much to my family. My wife Rona showed
love and sympathy, and gave support whenever needed. Our daughter Sarai, born in 1994,
had already acquired quite some mathematical and computer skills when I completed
the manuscript of the first edition in 2003. At the time, I tested a few exercises on her
(admittedly, unmarked ones), and I was amazed how much a nine-year-old can understand
about parallel computing. If she could do it then, you can now. Today, she is a skilled
programmer herself, and she encounters parallel computing everywhere. Without knowing
it, I wrote the book for her.
Other documents randomly have
different content
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied


warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation,


the trademark owner, any agent or employee of the Foundation,
anyone providing copies of Project Gutenberg™ electronic works in
accordance with this agreement, and any volunteers associated with
the production, promotion and distribution of Project Gutenberg™
electronic works, harmless from all liability, costs and expenses,
including legal fees, that arise directly or indirectly from any of the
following which you do or cause to occur: (a) distribution of this or
any Project Gutenberg™ work, (b) alteration, modification, or
additions or deletions to any Project Gutenberg™ work, and (c) any
Defect you cause.

Section 2. Information about the Mission


of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the


assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.

Section 3. Information about the Project


Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.

The Foundation’s business office is located at 809 North 1500 West,


Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact

Section 4. Information about Donations to


the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many
small donations ($1 to $5,000) are particularly important to
maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws regulating


charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states where


we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot make


any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.

Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.

Section 5. General Information About


Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.
Project Gutenberg™ eBooks are often created from several printed
editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,


including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.

You might also like