100% found this document useful (1 vote)
33 views

Download Parallel programming for modern high performance computing systems Czarnul ebook All Chapters PDF

modern

Uploaded by

faranjorgibi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
33 views

Download Parallel programming for modern high performance computing systems Czarnul ebook All Chapters PDF

modern

Uploaded by

faranjorgibi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Download the Full Version of textbook for Fast Typing at textbookfull.

com

Parallel programming for modern high performance


computing systems Czarnul

https://textbookfull.com/product/parallel-programming-for-
modern-high-performance-computing-systems-czarnul/

OR CLICK BUTTON

DOWNLOAD NOW

Download More textbook Instantly Today - Get Yours Now at textbookfull.com


Recommended digital products (PDF, EPUB, MOBI) that
you can download immediately if you are interested.

Parallel and High Performance Computing MEAP V09 Robert


Robey

https://textbookfull.com/product/parallel-and-high-performance-
computing-meap-v09-robert-robey/

textboxfull.com

Tools for High Performance Computing 2015 Proceedings of


the 9th International Workshop on Parallel Tools for High
Performance Computing September 2015 Dresden Germany 1st
Edition Andreas Knüpfer
https://textbookfull.com/product/tools-for-high-performance-
computing-2015-proceedings-of-the-9th-international-workshop-on-
parallel-tools-for-high-performance-computing-september-2015-dresden-
germany-1st-edition-andreas-knupfer/
textboxfull.com

High Performance Computing for Geospatial Applications


Wenwu Tang

https://textbookfull.com/product/high-performance-computing-for-
geospatial-applications-wenwu-tang/

textboxfull.com

Modern Systems Programming with Scala Native: Write Lean,


High-Performance Code without the JVM 1st Edition Richard
Whaling
https://textbookfull.com/product/modern-systems-programming-with-
scala-native-write-lean-high-performance-code-without-the-jvm-1st-
edition-richard-whaling/
textboxfull.com
Advances in High Performance Computing: Results of the
International Conference on “High Performance Computing”
Borovets, Bulgaria, 2019 Ivan Dimov
https://textbookfull.com/product/advances-in-high-performance-
computing-results-of-the-international-conference-on-high-performance-
computing-borovets-bulgaria-2019-ivan-dimov/
textboxfull.com

High Performance Computing in Science and Engineering '


18: Transactions of the High Performance Computing Center,
Stuttgart (HLRS) 2018 Wolfgang E. Nagel
https://textbookfull.com/product/high-performance-computing-in-
science-and-engineering-18-transactions-of-the-high-performance-
computing-center-stuttgart-hlrs-2018-wolfgang-e-nagel/
textboxfull.com

Fair Scheduling in High Performance Computing Environments


Art Sedighi

https://textbookfull.com/product/fair-scheduling-in-high-performance-
computing-environments-art-sedighi/

textboxfull.com

A Practical Approach to High-Performance Computing Sergei


Kurgalin

https://textbookfull.com/product/a-practical-approach-to-high-
performance-computing-sergei-kurgalin/

textboxfull.com

High Performance Computing in Science and Engineering 15


Transactions of the High Performance Computing Center
Stuttgart HLRS 2015 1st Edition Wolfgang E. Nagel
https://textbookfull.com/product/high-performance-computing-in-
science-and-engineering-15-transactions-of-the-high-performance-
computing-center-stuttgart-hlrs-2015-1st-edition-wolfgang-e-nagel/
textboxfull.com
Parallel Programming
for Modern
High Performance
Computing Systems
Parallel Programming
for Modern
High Performance
Computing Systems

Paweł Czarnul
Gdańsk University of Technology, Poland
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2018 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper


Version Date: 20171127

International Standard Book Number-13: 978-1-1383-0595-3 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access
www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization
that provides licenses and registration for a variety of users. For organizations that have been granted
a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
To my daughter Ala
Contents

List of figures xiii

List of tables xvii


List of listings xix
Preface xxiii

Chapter 1  Understanding the need for parallel computing 1

1.1 INTRODUCTION 1
1.2 FROM PROBLEM TO PARALLEL SOLUTION –
DEVELOPMENT STEPS 2
1.3 APPROACHES TO PARALLELIZATION 4
1.4 SELECTED USE CASES WITH POPULAR APIS 6
1.5 OUTLINE OF THE BOOK 7

Chapter 2  Overview of selected parallel and distributed


systems for high performance computing 11

2.1 GENERIC TAXONOMY OF PARALLEL COMPUTING


SYSTEMS 11
2.2 MULTICORE CPUS 12
2.3 GPUS 14
2.4 MANYCORE CPUS/COPROCESSORS 17
2.5 CLUSTER SYSTEMS 19
2.6 GROWTH OF HIGH PERFORMANCE COMPUTING
SYSTEMS AND RELEVANT METRICS 20
2.7 VOLUNTEER-BASED SYSTEMS 22
2.8 GRID SYSTEMS 25

vii
viii  Contents

Chapter 3  Typical paradigms for parallel applications 29

3.1 ASPECTS OF PARALLELIZATION 30


3.1.1 Data partitioning and granularity 30
3.1.2 Communication 32
3.1.3 Data allocation 32
3.1.4 Load balancing 33
3.1.5 HPC related metrics 34
3.2 MASTER-SLAVE 35
3.3 GEOMETRIC SPMD 39
3.4 PIPELINING 55
3.5 DIVIDE-AND-CONQUER 56

Chapter 4  Selected APIs for parallel programming 69

4.1 MESSAGE PASSING INTERFACE (MPI) 74


4.1.1 Programming model and application structure 74
4.1.2 The world of MPI processes and threads 75
4.1.3 Initializing and finalizing usage of MPI 75
4.1.4 Communication modes 76
4.1.5 Basic point-to-point communication routines 76
4.1.6 Basic MPI collective communication routines 78
4.1.7 Packing buffers and creating custom data types 83
4.1.8 Receiving a message with wildcards 85
4.1.9 Receiving a message with unknown data size 86
4.1.10 Various send modes 87
4.1.11 Non-blocking communication 88
4.1.12 One-sided MPI API 90
4.1.13 A sample MPI application 95
4.1.14 Multithreading in MPI 97
4.1.15 Dynamic creation of processes in MPI 99
4.1.16 Parallel MPI I/O 101
4.2 OPENMP 102
4.2.1 Programming model and application structure 102
4.2.2 Commonly used directives and functions 104
4.2.3 The number of threads in a parallel region 109
Contents  ix

4.2.4 Synchronization of threads within a parallel


region and single thread execution 109
4.2.5 Important environment variables 111
4.2.6 A sample OpenMP application 112
4.2.7 Selected SIMD directives 115
4.2.8 Device offload instructions 115
4.2.9 Tasking in OpenMP 117
4.3 PTHREADS 118
4.3.1 Programming model and application structure 118
4.3.2 Mutual exclusion 121
4.3.3 Using condition variables 123
4.3.4 Barrier 124
4.3.5 Synchronization 125
4.3.6 A sample Pthreads application 125
4.4 CUDA 127
4.4.1 Programming model and application structure 127
4.4.2 Scheduling and synchronization 131
4.4.3 Constraints 134
4.4.4 A sample CUDA application 134
4.4.5 Streams and asynchronous operations 137
4.4.6 Dynamic parallelism 141
4.4.7 Unified Memory in CUDA 143
4.4.8 Management of GPU devices 145
4.5 OPENCL 147
4.5.1 Programming model and application structure 147
4.5.2 Coordinates and indexing 155
4.5.3 Queuing data reads/writes and kernel execution 156
4.5.4 Synchronization functions 157
4.5.5 A sample OpenCL application 158
4.6 OPENACC 167
4.6.1 Programming model and application structure 167
4.6.2 Common directives 168
4.6.3 Data management 169
4.6.4 A sample OpenACC application 171
4.6.5 Asynchronous processing and synchronization 171
4.6.6 Device management 172
x  Contents

4.7 SELECTED HYBRID APPROACHES 172


4.7.1 MPI+Pthreads 173
4.7.2 MPI+OpenMP 177
4.7.3 MPI+CUDA 180

Chapter 5  Programming parallel paradigms using selected


APIs 185

5.1 MASTER-SLAVE 185


5.1.1 MPI 186
5.1.2 OpenMP 190
5.1.3 MPI+OpenMP 197
5.1.4 MPI+Pthreads 199
5.1.5 CUDA 207
5.1.6 OpenMP+CUDA 213
5.2 GEOMETRIC SPMD 218
5.2.1 MPI 218
5.2.2 MPI+OpenMP 220
5.2.3 OpenMP 225
5.2.4 MPI+CUDA 225
5.3 DIVIDE-AND-CONQUER 229
5.3.1 OpenMP 229
5.3.2 CUDA 232
5.3.3 MPI 235
5.3.3.1 Balanced version 236
5.3.3.2 Version with dynamic process creation 240

Chapter 6  Optimization techniques and best practices for


parallel codes 251

6.1 DATA PREFETCHING, COMMUNICATION AND


COMPUTATIONS OVERLAPPING AND INCREASING
COMPUTATION EFFICIENCY 252
6.1.1 MPI 253
6.1.2 CUDA 256
6.2 DATA GRANULARITY 257
6.3 MINIMIZATION OF OVERHEADS 258
Contents  xi

6.3.1 Initialization and synchronization overheads 258


6.3.2 Load balancing vs cost of synchronization 260
6.4 PROCESS/THREAD AFFINITY 260
6.5 DATA TYPES AND ACCURACY 261
6.6 DATA ORGANIZATION AND ARRANGEMENT 261
6.7 CHECKPOINTING 262
6.8 SIMULATION OF PARALLEL APPLICATION EXECUTION 264
6.9 BEST PRACTICES AND TYPICAL OPTIMIZATIONS 265
6.9.1 GPUs/CUDA 265
6.9.2 Intel Xeon Phi 266
6.9.3 Clusters 269
6.9.4 Hybrid systems 270

Appendix A  Resources 273

A.1 SOFTWARE PACKAGES 273

Appendix B  Further reading 275

B.1 CONTEXT OF THIS BOOK 275


B.2 OTHER RESOURCES ON PARALLEL PROGRAMMING 275

Index 297
List of figures

1.1 Top high performance computing systems according to the


TOP500 list, June 2017 3
1.2 Typical usage of APIs for parallel programs on specific
computing devices/architectures 8

2.1 Typical parallel systems described by an UML diagram 13


2.2 Architecture of Comcute 24
2.3 Grid computing 26

3.1 Basic master-slave structure with input data 36


3.2 Flow of the basic master-slave application over time, 1 of 2 37
3.3 Flow of the basic master-slave application over time, 2 of 2 38
3.4 Flow of the basic master-slave application with more data
packets over time, diameters denote execution times, 1 of 2 40
3.5 Flow of the basic master-slave application with more data
packets over time, diameters denote execution times, 2 of 2 41
3.6 Interaction diagram for master-slave processing with
overlapping communication and computations 42
3.7 Flow of the basic master-slave application with more data
packets over time and overlapping communication and
computations, diameters denote execution times, 1 of 2 43
3.8 Flow of the basic master-slave application with more data
packets over time and overlapping communication and
computations, diameters denote execution times, 2 of 2 44
3.9 Partitioning of a 2D space into subdomains 45
3.10 Subdomain within the whole input data domain in 3D 47
3.11 Partitioning of a 2D space with various cell weights into
subdomains 48
3.12 Activity diagram with basic steps of a typical parallel
geometric SPMD application 49

xiii
xiv  List of figures

3.13 Activity diagram with basic steps of a typical parallel


geometric SPMD application with overlapping communication
and computations 50
3.14 Activity diagram with basic steps of a typical parallel
geometric SPMD application with overlapping communication
and computations and dynamic load balancing 52
3.15 Activity diagram for the dynamic load balancing step 53
3.16 Activity diagram for the dynamic load balancing step
improved with dynamic load balancing step adjustment 54
3.17 Basic structure for pipelined processing 55
3.18 Basic flow for pipelined processing 56
3.19 Imbalanced tree corresponding to the divide-and-conquer
processing paradigm 57
3.20 Sample partitioning of the divide-and-conquer tree 59
3.21 Flow in the divide-and-conquer application, 1 of 2 60
3.22 Flow in the divide-and-conquer application, 2 of 2 61
3.23 Imbalanced tree corresponding to the divide-and-conquer
processing paradigm with various node processing times 62
3.24 Basic activity diagram of a process/thread with computations
and load balancing for divide-and-conquer processing 63
3.25 Improved activity diagram with computations and load
balancing performed by two threads for divide-and-conquer
processing 64
3.26 Cutting off subtrees for divide-and-conquer processing 66
3.27 Cutting off subtrees for divide-and-conquer processing –
continued 67

4.1 Architecture of an MPI application 74


4.2 Execution time of the testbed MPI application 98
4.3 Execution time of the testbed OpenMP application 114
4.4 Execution time of the testbed Pthreads application 128
4.5 Architecture of a CUDA application, a 2D example 130
4.6 Memory types and sample usage by an application running on
a GPU 132
4.7 Architecture of an OpenCL application, a 2D example 149
4.8 Execution time of the testbed MPI+Pthreads application 176
4.9 Execution time of the testbed MPI+OpenMP application 179

5.1 Master-slave scheme with MPI 187


List of figures  xv

5.2 Execution time of the testbed MPI master-slave application 190


5.3 Master-slave scheme with OpenMP – initial version 192
5.4 Master-slave scheme with OpenMP – version with all threads
executing same code 193
5.5 Master-slave scheme with OpenMP – version with tasks 195
5.6 Master-slave scheme with MPI and OpenMP 198
5.7 Master-slave scheme with MPI and Pthreads 201
5.8 Master-slave scheme with CUDA 208
5.9 Master-slave scheme with OpenMP and CUDA 214
5.10 Subdomain and order of communication operations for an MPI
SPMD application 220
5.11 Processes and communication operations for an MPI SPMD
application 221
5.12 Execution time of the testbed MPI SPMD application 222
5.13 Execution time of the testbed MPI+OpenMP SPMD
application 224
5.14 Architecture of the MPI+CUDA application 226
5.15 Source code structure for the MPI+CUDA application 227
5.16 Divide-and-conquer scheme with OpenMP 230
5.17 Divide-and-conquer scheme with CUDA and dynamic
parallelism 233
5.18 Balanced divide-and-conquer scheme with MPI 237
5.19 Execution time of the MPI implementation of the divide and
conquer scheme 240
5.20 Divide-and-conquer scheme with MPI and dynamic process
creation 242
5.21 Handling spawn decisions for divide-and-conquer scheme with
MPI and dynamic process creation 243
5.22 Source code structure for the divide-and-conquer MPI
application with dynamic process creation 244

6.1 Execution time of an MPI master-slave application vs number


of data packets for fixed input data size 258
List of tables

2.1 Performance of the first cluster on the TOP500 list over time 20
2.2 The number of cores of the first cluster on the TOP500 list
over time 20
2.3 CPU clock frequency of the first cluster on the TOP500 list
over time 21
2.4 Performance to power consumption ratio of the first cluster on
the TOP500 list over time 21

3.1 Typical data partitioning and allocation procedures in parallel


processing paradigms 33

4.1 Presented features of parallel programming APIs, 1 of 2 72


4.2 Presented features of parallel programming APIs, 2 of 2 73
4.3 Execution times for two CUDA code versions 137
4.4 Execution times for the MPI+CUDA code 184

5.1 Execution times for various versions of the master-slave


OpenMP code 197
5.2 Execution time for various configurations of the
MPI+OpenMP code 200
5.3 Execution time for various configurations of the
MPI+Pthreads code 206
5.4 Execution time for various configurations of the CUDA code 212
5.5 Execution time for various configurations of the hybrid
OpenMP+CUDA code 218
5.6 Execution time for various configurations of the hybrid
OpenMP+CUDA code 218
5.7 Execution times for the MPI+CUDA code 228
5.8 Execution times for parallel OpenMP divide-and-conquer code 232

xvii
xviii  List of tables

6.1 Execution times for a geometric SPMD MPI+CUDA code,


without and with MPS 257
6.2 Execution times for two versions of MPI+OpenMP – SPMD
code 260
6.3 Execution time depending on data layout – SPMD code 262
List of listings

4.1 Sample parallel MPI program for computing ln(x) 96


4.2 Basic structure of an OpenMP application 102
4.3 Basic OpenMP structure with numbers of threads and function
calls 103
4.4 Basic OpenMP implementation of reduction with sum 106
4.5 Basic OpenMP implementation of reduction with maximum
value 106
4.6 Basic OpenMP implementation of computation of ln(x) 112
4.7 Improved version of an OpenMP implementation of
computation of ln(x) 113
4.8 Example using #pragma omp target in OpenMP 116
4.9 Example using #pragma omp task in OpenMP 117
4.10 Sample basic Pthreads structure with numbers of threads and
function calls 119
4.11 Parallel implementation of computing ln(x) using Pthreads 126
4.12 Sample CUDA implementation of a program for checking the
Collatz hypothesis 134
4.13 Improved CUDA implementation of a program for checking the
Collatz hypothesis 136
4.14 Sample OpenCL implementation of a program for verification of
the Collatz hypothesis 159
4.15 Improved OpenCL implementation of a program for verification
of the Collatz hypothesis 162
4.16 Basic OpenACC implementation of computation of ln(x) 171
4.17 Parallel implementation of computing ln(x) using combined
MPI and Pthreads 174
4.18 Parallel implementation of computing ln(x) using combined
MPI and OpenMP 177
4.19 MPI and CUDA implementation of verification of the Collatz
hypothesis – host code 180

xix
xx  List of listings

4.20 MPI and CUDA implementation of verification of the Collatz


hypothesis – kernel and kernel invocation 182
5.1 Basic master-slave application using MPI – master’s key code 188
5.2 Basic master-slave application using MPI – slave’s key code 189
5.3 Master-slave in OpenMP with each thread fetching input data
and storing output – main part 192
5.4 Master-slave in OpenMP using the #pragma omp task directive
– main part 195
5.5 Basic master-slave application using MPI and OpenMP –
function slavecpu(...) 198
5.6 Master-slave application using MPI and Pthreads and input
and output queues – key master code 202
5.7 Master-slave application using MPI and Pthreads and input
and output queues – key main thread code of a slave 203
5.8 Master-slave application using MPI and Pthreads and input
and output queues – key compute slave code 204
5.9 Master-slave application using MPI and Pthreads and input
and output queues – key sending thread code of a slave 205
5.10 Basic master-slave application using multiple GPUs with
CUDA and one host thread – kernel code 207
5.11 Basic master-slave application using multiple GPUs with
CUDA and one host thread – host thread key code 209
5.12 Basic master-slave application using multiple GPUs with
CUDA and CPU with OpenMP and multiple host threads – key
host thread code 213
5.13 Basic master-slave application using multiple GPUs with
CUDA and CPU with OpenMP and multiple host threads –
slavecpu(...) function 216
5.14 Divide-and-conquer application for adaptive integration using
OpenMP – function integratedivideandconquer(...) 230
5.15 Divide-and-conquer application for adaptive integration using
CUDA with dynamic parallelism – kernel implementation 233
5.16 Basic divide-and-conquer application for balanced computations
using MPI – partitioning 237
5.17 Basic divide-and-conquer application for balanced computations
using MPI – merging 239
5.18 Divide-and-conquer application using dynamic process creation
in MPI – divideandconquer function 245
5.19 Divide-and-conquer application using dynamic process creation
in MPI – key code lines of the root process 248
List of listings  xxi

5.20 Divide-and-conquer application using dynamic process creation


in MPI – key code lines of non-root processes 249
6.1 Receiving data using MPI and overlapping communication and
computations with non-blocking calls 253
6.2 Receiving data using MPI and overlapping communication and
computations with non-blocking calls and using two buffers 253
6.3 Receiving data and sending results using MPI and overlapping
communication and computations with non-blocking calls and
using two buffers 254
Preface

Parallel computing systems have recently become more and more accessible to
a wide range of users. Not only programmers in high performance computing
centers but also a typical consumer can now benefit from high performance
computing devices installed even in desktop computers. The vast majority of
new computers sold today feature multicore CPUs and GPUs which can be
used for running parallel programs. Such usage of GPUs is often referred to
as GPGPU (General Purpose Computations on Graphics Processing Units).
Among devices announced by manufacturers are, for instance, a 7th generation
Intelr Core™ i7-7920HQ CPU that features 4 cores with HyperThreading for
8 logical processors clocked at 3.1GHz (up to 4.1GHz in turbo mode) and
a TDP (Thermal Design Power) of 45W. AMD Ryzen™ 7 1800X features 8
cores for 16 logical processors clocked at 3.6 GHz (4 GHz in turbo mode)
and a TDP of 95W. A high end desktop Intel Core i9-7900X CPU features
10 cores with HyperThreading for 20 logical processors clocked at 3.3GHz
(up to 4.3GHz in turbo mode) and a TDP of 140W. NVIDIAr Titan X,
based on the NVIDIAr Pascal™ architecture, features 3584 CUDAr cores
and 12GB of memory at the base clock of 1417MHz (1531MHz in boost)
with a power requirement of 250W. Workstations or servers can use CPUs
such as Intelr Xeonr Scalable processors such as Intel Xeon Platinum 8180
processor that features 28 cores and 56 logical processors clocked at 2.5 GHz
(up to 3.8GHz in turbo mode) and a power requirement of 205W or Intel
Xeon E5-4669v4 with 22 cores and 44 logical processors clocked at 2.2 GHz (3
GHz in turbo mode) and a TDP of 135W. AMD Opteron™ 6386 SE features
16 cores clocked at 2.8 GHz (3.5 GHz in turbo mode) with a TDP of 140 W.
High performance oriented GPUs include NVIDIAr Teslar P100, based on
the Pascal architecture, with 3584 CUDA cores at the base clock of 1480MHz
in boost and with a power requirement of 250W as well as NVIDIA Tesla
V100 with 16 GB HBM2 memory, 5120 CUDA cores clocked at 1455MHz
in boost and with a power requirement of 300W. AMD FirePro™ W9100
features 2816 Stream Processors, 32GB or 16GB GDDR5 GPU memory with
a power requirement of 275W. High performance oriented machines can use
coprocessors such as Intelr Xeon Phi™ x100 7120A with 61 cores clocked at
1.238GHz and a TDP of 300W or e.g. Intel Xeon Phi x200 7250 processors
with 68 cores clocked at 1.4GHz with a TDP of 215W. As it was the case
in the past and is still the case today, computer nodes can be interconnected
together within high performance computing clusters for even greater compute

xxiii
xxiv  Preface

performance at the cost of larger power consumption. At the time of this


writing, the most powerful cluster on the TOP500 list is Sunway TaihuLight
with over 10 million cores and over 93 PFlop/s Linpack performance. The
cluster takes around 15.37 MW of power.
Hardware requires software, in particular compilers and libraries, to allow
development, building and subsequent execution of parallel programs. There
is a variety of existing parallel programming APIs which makes it difficult to
become acquainted with all programming options and how key elements of
these APIs can be used and combined.
In response to this the book features:

1. Description of state-of-the-art computing devices and systems available


today such as multicore and manycore CPUs, accelerators such as GPUs,
coprocessors such as Intel Xeon Phi, clusters.
2. Approaches to parallelization using important programming paradigms
such as master-slave, geometric Single Program Multiple Data (SPMD)
and divide-and-conquer.
3. Description of key, practical and useful elements of the most popular
and important APIs for programming parallel HPC systems today: MPI,
OpenMPr , CUDAr , OpenCL™, OpenACCr .
4. Demonstration, through selected code listings that can be compiled and
run, of how the aforementioned APIs can be used to implement impor-
tant programming paradigms such as: master-slave, geometric SPMD
and divide-and-conquer.
5. Demonstration of hybrid codes integrating selected APIs for poten-
tially multi-level parallelization, for example using: MPI+OpenMP,
OpenMP+CUDA, MPI+CUDA.
6. Demonstration of how to use modern elements of the APIs e.g. CUDAs
dynamic parallelism, unified memory, MPIs dynamic process creation,
parallel I/O, one-sided API, OpenMPs offloading, tasking etc.
7. Selected optimization techniques including, for instance: overlapping
communication and computations implemented using various APIs e.g.
MPI, CUDA, optimization of data layout, data granularity, synchroniza-
tion, process/thread affinity etc. Best practices are presented.

The target audience of this book are students, programmers, domain spe-
cialists who would like to become acquainted with:

1. popular and currently available computing devices and clusters systems,


2. typical paradigms used in parallel programs,
3. popular APIs for programming parallel applications,
Preface  xxv

4. code templates that can be used for implementation of paradigms such


as: master-slave, geometric Single Program Multiple Data and divide-
and-conquer,
5. optimization of parallel programs.

I would like to thank Prof. Henryk Krawczyk for encouraging me to con-


tinue and expand research in the field of parallel computing as well as my
colleagues from Department of Computer Architecture, Faculty of Electron-
ics, Telecommunications and Informatics, Gdańsk University of Technology
for fruitful cooperation in many research and commercial projects, stating
challenges and motivating me to look for solutions to problems. Furthermore,
I would like to express gratitude for corrections of this work by Dr. Mariusz
Matuszek, Pawel Rościszewski and Adam Krzywaniak. Finally, it was a great
pleasure to work with Randi Cohen, Senior Acquisitions Editor, Computer
Science, Robin Lloyd-Starkes, Project Editor and Veronica Rodriguez, Edito-
rial Assistant at Taylor & Francis Group.

Pawel Czarnul
Gdańsk, Poland
CHAPTER 1

Understanding the need


for parallel computing
CONTENTS
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 From problem to parallel solution – development steps . 2
1.3 Approaches to parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Selected use cases with popular APIs . . . . . . . . . . . . . . . . . . . 6
1.5 Outline of the book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1 INTRODUCTION
For the past few years, increase in performance of computer systems has been
possible through several technological and architectural advancements such
as:

1. Within each computer/node: increasing memory sizes, cache sizes, band-


widths, decreasing latencies that all contribute to higher performance.
2. Among nodes: increasing bandwidths and decreasing latencies of inter-
connects.
3. Increasing computing power of computing devices.

It can be seen, as discussed further in Section 2.6, that CPU clock frequen-
cies have generally stabilized for the past few years and increasing computing
power has been possible mainly through adding more and more computing
cores to processors. This means that in order to make the most of available
hardware, an application should efficiently use these cores with as little over-
head or performance loss as possible. The latter comes from load imbalance,
synchronization, communication overheads etc.
Nowadays, computing devices typically used for general purpose calcula-
tions, used as building blocks for high performance computing (HPC) systems,
include:

1
2  Understanding the need for parallel computing

• Multicore CPUs, both desktop e.g. a 7th generation Intel i7-7920HQ


CPU that features 4 cores with HyperThreading for 8 logical proces-
sors clocked at 3.1GHz (up to 4.1GHz in turbo mode) and server type
CPUs such as Intel Xeon E5-2680v4 that features 14 cores and 28 logical
processors clocked at 2.4 GHz (up to 3.3GHz in turbo mode) or AMD
Opteron 6386 SE that features 16 cores clocked at 2.8 GHz (3.5 GHz in
turbo mode).
• Manycore CPUs e.g. Intel Xeon Phi x200 7290 that features 72 cores
(288 threads) clocked at 1.5 GHz (1.7 GHz in boost).
• GPUs, both desktop e.g. NVIDIAr GeForcer GTX 1070, based on the
Pascal architecture, with 1920 CUDA cores at base clock 1506 MHz
(1683 MHz in boost), 8GB of memory or e.g. AMD R9 FURY X with
4096 Stream Processors at base clock up to 1050 MHz, 4GB of memory
as well as compute oriented type devices such as NVIDIA Tesla K80
with 4992 CUDA cores and 24 GB of memory, NVIDIA Tesla P100
with 3584 CUDA cores and 16 GB of memory, AMD FirePro S9170
with 2816 Stream Processors and 32 GB of memory or AMD FirePro
W9100 with 2816 Stream Processors and up to 32GB of memory.
• Manycore coprocessors such as Intel Xeon Phi x100 7120A with 61 cores
at 1.238GHz and 16 GB of memory or Intel Xeon Phi x200 7240P with
68 cores at 1.3GHz (1.5 GHz in boost) and 16 GB of memory.

1.2 FROM PROBLEM TO PARALLEL SOLUTION – DEVELOPMENT


STEPS
Typically, development of computational code involves several steps:

1. Formulation of a problem with definition of input data including data


format, required operations, format of output results.
2. Algorithm design. An algorithm is a procedure that takes input data and
produces output such that it matches the requirements within problem
definition. It is usually possible to design several algorithms that achieve
the same goal. An algorithm may be sequential or parallel. Usually a se-
quential algorithm can be parallelized i.e. made to run faster by splitting
computations/data to run on several cores with necessary communica-
tion/synchronization such that correct output is produced.
3. Implementation of an algorithm. In this case, one or more Application
Programming Interfaces (APIs) are used to code the algorithm. Typi-
cally an API is defined such that its implementation can run efficiently
on a selected class of hardware.
Exploring the Variety of Random
Documents with Different Content
565. Jasmine Lee. By C. Fraser Tytler. (Strahan) 5s.
The piteous story of a poor little abducted heiress.
566. Madeleine. By Julia Kavanagh. (Chapman & Hall) 2s.
The beautiful true tale of the French peasant girl who founded a
hospital for incurables.
567. Gabrielle Vaughan. (Seeley) 5s.
A great favourite.
568. A Vantage-Ground for doing Good. By Florence Wilford.
(Masters) 4s. 6d.
569. A Maiden of Our Own Day. By Florence Wilford. (Masters)
6s.
Both excellent in their different lines.
570. Through Trial to Triumph. By Maggie Symington. (Cassell)
2s. 6d.
The troubles of a wife who cannot understand her husband.
571. A Young Philistine. By Alice Corkran. (Burns & Oates)
Three charming tales, two of foreign life, all teaching tenderness
for the feelings of others.
572. A Promise Kept. By Mary E. Palgrave. (National Society) 3s.
Unusually striking and beautiful. A girl, whose dreams inspire
missionary ardour, yet who has not steadfastness or courage
enough to follow out her own visions when they may become
earnest.
573. By Northern Seas. By Mary Bell. (Church Extension Society)
Dissent is here treated justly and fairly, and the tale is thoroughly
interesting, containing natural though striking characters.
574. Byewords. By C. M. Yonge. (Macmillan) 6s.
Short tales mostly reprinted from the Christmas numbers of the
‘Monthly Packet.’
575. Uncle Max. By R. N. Carey. (Bentley)
An excellent tale of village nursing.
FAIRY TALES.
There are certain fairy tales that are absolute classics, and a
knowledge of which is absolutely necessary to understand common
allusions. The grandmothers have ceased to tell them, and the little
chap-books are no more, so that it has happened to me to pause on
a mention of ‘Cinderella’ or ‘Beauty and the Beast,’ and find no one
understand it, and I have kept a whole school interested while
waiting for an entertainment by telling one of these. Therefore, a
small list is here given, for fairy tales should be regarded as treats,
and only the superior ones put forth freely. It will generally be found
that in the first stage of education they are despised, but that
children of any imagination enjoy them. One or two imaginative
classics are added.
576. The Fairy Book. Selected by the Author of ‘John Halifax.’
(Macmillan) 4s. 6d.
These are the genuine old fairy tales, that ought to be known to
everyone, simply told.
577. Grimm’s Fairy Tales. Mrs. Paull’s Selection. (Warne) 3s. 6d.
or 2s. 6d. Globe edition (Macmillan) 2s.
These two sets make up the real folk-lore tales—remnants of old
myths, of more modern ones.
578. Hans Christian Andersen’s Fairy Tales. Mrs. Paull’s
Selection. (Warne) 3s. 6d.
These have, by their merits, become almost as classical as their
predecessors, and quite as proverbial. The ‘Ugly Duckling’ and the
‘Daisy’ ought to be known to all.
579. The Hope of the Katzekopfs. By the Rev. F. Paget. (Masters)
2s.
Deserves to be classical for its fun and its moral.
580. Old-fashioned Fairy Tales. By Mrs. Ewing. (S.P.C.K.) 3s. 6d.
Modern, but according to the ancient rules of fairy tales.
581. The Arabian Nights. By the Rev. G. F. Townsend. (Routledge)
3s. 6d.
These are almost necessary for the understanding of allusions,
besides the fascination of such tales as ‘Aladdin’s Lamp,’ ‘The Forty
Thieves,’ or ‘Sindbad.’ It is remarkable that Hannah More thought
even the old uncastigated tradition translated from the French more
wholesome reading for young people than contemporary tales of
character, perhaps because less tending to introspection.
582. Alice in Wonderland. By Lewis Carroll. (Macmillan) 6s.
583. Through the Looking-Glass. By Lewis Carroll. (Macmillan)
6s.
It takes some cultivation to enjoy these wondrously droll
compositions.
584. The Water Babies. By C. Kingsley. (Macmillan) 6s.
The same may be said of this. These are literature, though we are
not sure whether ordinary school children would care for them.
585. Fairy Legends of the South of Ireland. By Croker. (Swan
Sonnenschein) 5s.
These are some of the most delightful fairy tales in existence, told
with an Irish humour that adds infinitely to their zest.
586. The Light Princess. By G. Macdonald. (Daldy) 2s. 6d.
Worthy to be old fairy tales.
587. The Little Lame Prince. By the Author of ‘John Halifax.’
(Macmillan) 4s. 6d.
Too beautiful and earnest not to be well worth reading.
588. Four Winds Farm. By Mrs. Molesworth. (Macmillan)
One of the best of Mrs. Molesworth’s dream-like tales.
589. Down the Snow Stairs. By Alice Corkran. (Blackie) 6s.
Of the same type.
MOTHERS’ MEETINGS.
Weary, hardworked women thoroughly enjoy a bit of interesting
reading, whether pathetic or droll. Foreign tales or those of
adventure do not, as a rule, interest them, and the old-fashioned
book, where a preternaturally wise dame instructs her neighbours is
too much a lecture in disguise. By all means, let there be some
religious reading, then if possible some on sanitary habits, domestic
economy and management of children, but not under the disguise of
a story. A good, genuine fiction gives them a real interest and
something to talk of. It should not appear to be a child’s book or they
will feel insulted, but they like nothing better than when the joys or
sorrows turn on an infant; and there is no better mode of conveying
indirect lessons—to some persons, that is to say, for there are others
who have no notion of applying what they hear to real life. Still,
wholesome amusement is a thing of which they get all too little, and
the pleasure of being read to is one they thoroughly appreciate. Of
course these books are specially fitted for lending to old or young.
They are only classed under the category of books for Mothers’
Meetings because eminently fitted for that purpose as well as for
Lending Library shelves.

FICTION.
590. A Dog’s Mission. By Mrs. Beecher Stowe. (Nelson) 1s. 6d.
A family reconciliation.
591. The Story of the Lost Emerald. By Emma Marshall. (Nelson)
1s.
The loss of an old maid’s much-valued jewel at a fire rouses her to
think of higher things.
592. Pamela’s Bequest. By Mrs. Sandford. (Walter Smith) 2s. 6d.
The bequest is a delicate child, left by a dying mother to a kindly
little formal dressmaker while the father is at sea. The complications
on his return are most effective. When he makes a blundering offer
and gets refused, a listening woman has been known to rap the table
in an ecstasy of enjoyment.
593. Afloat. By Mrs. Stanley Leathes. (John F. Shaw) 3s. 6d.
A family bereaved for a time of a little girl sent adrift in a boat by
an idiot. It excites great interest.
594. Burnt Out. By C. M. Yonge. (Walter Smith) 2s. 6d.
On the demoralising effects of going about with a petition.
595. Aunt Kezia’s Will. By S. M. Sitwell. (S.P.C.K.) 1s. 6d.
A family quarrel pacified through the love and interest excited by a
blind child.
596. Laddie. (Walter Smith) 1s.
A most touching story of an old peasant woman’s journey to
London to see her son, who has risen to eminence as a doctor.
597. Short Stories for Mothers’ Meetings. By Florence Wilford.
(Masters) 2s.
Well-written stories, especially fitted for those meetings where the
attendance is too irregular for continuous reading to be advisable.
598. Tales for Mission Rooms. (S.P.C.K.) 2s.
The first of these is a capital lesson on gossip; the second has a
very touching portrait in it.
599. Meg’s Mistake. By Mrs. O’Reilly. (Strahan)
Originally published as ‘Sussex Stories.’ Very lifelike, and two at
least can be read with admirable effect—namely, ‘Fairy Gold’ and
another bringing in the accident to the London steamboat ‘Princess
Alice.’ The others have been tried, but do not seem as well liked.
Perhaps they are too wordy.
600. Pictures of Cottage Life. By M. Poole. (Macmillan) 3s. 6d.
These are thoroughly delightful. There is an old woman with what
she thinks is a skeleton warning in her eye, also a deserted wife and
an adopted child, who all are completely real and as touching as
they are quaint.
601. The Cottage Next Door. By Helen Shipton. (S.P.C.K.) 1s.
The taming of a rough lad through the helplessness of the pretty
little silly wife and babies whom his brute of a brother abandons for a
while.
602. True Gold. (Church Extension Society) 2s.
A family at the gold-diggings, where the wife realises more at last
by making ginger-beer than the husband by all his find of nuggets,
and her faithful uprightness and industry are the saving of all.
603. Harry’s Discipline. By Laura Lane. (S.P.C.K.) 1s. 6d.
A good-natured careless young railway porter neglects his mother
till she is almost starved. The lesson is chiefly meant for the sons,
but it deeply affects the mothers, and is a warning to them not to
spoil their boys.
604. The Lion Battalion. Story 2. (See No. 21.)
605. Little Meg’s Children. (See No. 49.)
606. Scenes in a Children’s Hospital. By L. Burke. (R.T.S.) 1s.
Interests the mothers greatly.
607. Wee Willie Winkie. (Cassell) 1s. 6d.
The beginning, being an old fisherman’s difficulties with a baby
rescued from a wreck, is much enjoyed. The latter part is neither so
natural nor so effective.
608. What a Man Soweth. By G. Stebbing. (Nisbet) 3s. 6d.
A boy perverted by his mother laughing at small pilferings. The
conclusion is improbably happy, but the tale is excellent.
609. The Storm of Life. By Hesba Stretton. (R.T.S.) 1s. 6d.
A painful but very effective story of a poor woman just out of prison
striving to redeem her character and save her little girl from her
wicked husband. The only flaw in the book is the disregard of
baptism for a babe only born to die.
610. An Innocent. By S. M. Sitwell. (S.P.C.K.) 1s. 6d.
Here a little half-witted girl is the good angel of her rough, careless
parents. The people are very naturally drawn.
611. The Watchers on the Longships. By J. F. Cobb. (Wells
Gardner, Darton, & Co.) 3s. 6d.
A lighthouse story. Very welcome on the coast, where a woman
has been known to lie awake thinking of it.
612. My Little Patient. (Masters) 6d.
Supposed to be told by a doctor. Full of pathos, which touches
mothers more than it does children.
613. Copsley Annals. By Miss Elliot. (Seeley) 5s.
A delightful book for all ages. Perhaps the best for mothers is the
tale that has been published separately under the title of
614. Mrs. Blackett’s Story. 1s.
615. Tried and True. By Florence Wilford. (S.P.C.K.) 2s.
The faithfulness of a fly driver, who wins his wife back from habits
of intoxication.
616. Bearing the Yoke. By Helen Shipton. (S.P.C.K.) 2s.
A young farmer weighed down by a liability incurred by his father.
617. Young Sixfoot. By Mrs. Garnett. (S.P.C.K.) 6d.
A navvy story, but likely to be highly appreciated by women.
618. Tales of the Bush. By Mrs. Vidal. (Masters) 3s. 6d.
Australian life, but good for all, especially one on Sunday trading.
619. Daddy Dick. By Mrs. Bromfield. 3s. 6d.
The civilisation of a rude lad through a little waif. It appeals to the
maternal sympathy.
620. An Empty House. By E. Wordsworth. (Hatchards) 6d.
A story of much power and beauty, turning on a crime committed
by an intoxicated man.
621. Bede’s Charity. By Hesba Stretton. (R.T.S.) 3s. 6d.
A beautiful and striking tale. No one can better strike the chords of
homely pathos than Hesba Stretton, but all her tales are not equal,
and some are written for special purposes.
622. Friends till Death. By Hesba Stretton. (R.T.S.) 1s. 6d.
The very touching affection of an old shepherd for his helpless
friend.
623. Homes Made and Marred. (R.T.S.) 2s. 6d.
Sensible and useful.
624. Two Christmas Stories. By Hesba Stretton. (R.T.S.) 6d.
The last is specially excellent when a short effective tale is
wanted.
625. Seeketh Not Her Own. (See No. 500.)
626. The Heroine of a Basket Van. By Mary Bramston. (National
Society) 2s. 6d.
Excellent for mothers as well as children.
627. High and Lowly. By Ellen Davis. (Nisbet) 2s.
Well-told migrations of a retired servant in search of a home. A
Blue Ribbon conquest at the end.
628. For Half-a-Crown. (See No. 114.)
629. A Railway Garden. By Mrs. Sitwell. (S.P.C.K.) 1s.
A bright wife and a nagging wife, also a lesson against being hard
on a sinner.
630. Gran. (Nisbet) 2s. 6d.
A drinking husband suddenly reformed by his child’s death.
631. Five Thousand Pounds. By Agnes Giberne. (Nisbet) 2s.
A sad story of a legacy proving the ruin of a family.
632. The Black Coppice. By Mrs. Lawson. (S.P.C.K.) 1s.
A very excellent narrative of the trials of a poacher’s good wife,
entering more than do many such books into real difficulties in
Church-going.
633. Two Poor Old Women. By Mrs. Lawson. (S.P.C.K.) 12s. per
100.
A spirited tract on content and discontent.

COUNSEL.
When the clergyman will open mothers’ meetings, and give a little
instruction, this is all that is requisite to convey the religious tone. If
he be not there, it may be well to begin with something serious.
Some ladies can explain a chapter of the Bible, but in most cases a
reading will be most convenient for the purpose. Here are a few
suggested:—
634. Letters from an Unknown Friend. By the Author of ‘Charles
Lowder.’ (Kegan Paul) 1s. (See No. 280.)
Short explanations of the claims of the Church, which may be
useful as guarding against Dissent.
635. An Address to Women. By the Bishop of Carlisle. (S.P.C.K.)
2d.
This is a most admirable, practical address given at the time of the
Carlisle Church Congress. It goes into the ordinary trials of woman’s
life with great force, and at the same time gives all encouragement.
636. An Earnest Appeal to Mothers. By Mrs. G. Sumner. (Nisbet)
3d.
Strong and touching appeals to mothers on guarding the purity of
their children from the first.
637. A Few Words to Mothers of Little Children. (Hatchards) 2d.
each or 50 at half-price.
Teaching the same lesson of preserving modesty. These three
little books may be given broadcast, but they will be more effective if
first read.
638. Half-hours at Mothers’ Meetings. 2s.
Some of the little discourses here are very useful. One entitled
‘The Hour of Temper’ merits especial praise.
639. The Chimney Corner. By E. Wordsworth. (Hatchards) 1s.
6d.
640. Short Words for Long Evenings. By E. Wordsworth.
(Hatchards) 1s. 6d.
641. Work-a-Day World. By E. Wordsworth. (Hatchards) 1s. 6d.
All the above three are deeply thoughtful, often poetical, yet simple
moralisings on common things.
642. Plain Words. By the Bishop of Bedford. One series 2s., or in
separate tracts in 3 packets, 1s. each.
The force and beauty of these need no praise here, and they have
the further merit of being just the right length.
643. The Scripture Half-hour at Mothers’ Meetings. (R.T.S.) 2s.
There are some admirable bits here, especially in the way of true
anecdote and application, but some selection may be needful.
644. Bits of Talk on Home Matters. (Sampson Low) 2s.
There are most admirable chapters in this little book; to be valued
by mothers of all degrees. ‘A polite mother’ is an admirable lesson.
The above are serious. Those that follow are domestic and
secular.
645. Ways and Means in a Devonshire Village. (S.P.C.K.) 1s. 6d.
Conversations on household management and cookery, done with
spirit, and eliciting remarks and comparisons.
646. Lectures on Health. By Mrs. Hallett. (Hatchards) 1s. 6d.
Very useful explanations of sanitary measures in plain language.
647. How to be Well. By Mrs. Hallett. (Walter Smith) 1s.
Good advice on clothing, food, and regulation.
648. Till the Doctor Comes. By Dr. Hope. (R.T.S.) 6d., in cloth 9d.
649. The Making of the Home. By Mrs. C. Barnett. (Cassell) 1s.
6d.
Very good hints on house, health, and clothes.
650. Social Economy Reading Book. (National Society) 2s.
Even better adapted for reading to mothers than by children.

EXTRACTS.
A few passages are here mentioned as serving well to read aloud
at Mothers’ Meetings, though the whole book might not serve equally
well.
651. The Way of the Cross. By Emily S. Holt. (Shaw) 1s. 6d.
The ‘Web Ismene Wove,’ the third tale in this book, is exceedingly
beautiful, and is an excellent reading near Passiontide. It is the story
of a Greek girl at Jerusalem, who longs to make something to be
used in the service of the God of Israel. The white web she weaves
comes to be sold in haste to Joseph of Arimathæa, and thus her
longing is fulfilled. The second tale is harmless, being of the mother
of Ahaz, and how she spoilt her son; but the first would hardly be
given or read aloud by those who would shrink from the strong
assertion that SS. James, Jude, and Joses were sons of the Blessed
Virgin.
652. The Man on the Top of the Ark, and other Gospel Parables.
By Alexander MacLeod Symington. (Nisbet) 1s.
If at the end of the first parable the reader inserts the text, ‘The like
figure whereunto even Baptism doth now save us,’ the teaching is
complete. The application of the Brazen Serpent and the City of
Refuge is also excellent. They are the Biblical history dramatised, as
it were. (See No. 353.)
653. Catharine and Crawfurd Tait. (Macmillan) 2s. 6d. and 6s.
If the reader can command her voice to get through it, the history
of Mrs. Tait’s successive bereavements will be listened to with
intense interest.
654. Mrs. Gaskell’s Tales. In 7 vols. (Smith, Elder, & Co.) 2s. 6d.
and 3s. 6d. each.
655. Libbie Marsh’s Three Eras.
St. Valentine’s day. Story of a factory girl and a cripple.
656. The Sexton’s Story.
An heroic act of self-sacrifice.
657. Christmas Storms and Sunshine.
A quarrel made up over a baby.
These three admirable stories are bound up with others less useful
in collected editions of Mrs. Gaskell’s Tales, and are not to be had
separately.
658. In Mary Barton, by Mrs. Gaskell (Smith, Elder, & Co.) (see
No. 551),
Job’s description of the two old men’s journey by the coach with
the baby cannot fail to enchant the women.
FOR MISSIONARY WORKING-
PARTIES.
These are of such very different composition that all that can be
done here is to suggest books bearing on varieties of Mission labour
at home and abroad, such as may interest either cultivated ladies,
middle-class women, or very young people.
659. Home Workers for Foreign Missions. By E. J. Whately.
(R.T.S.) 1s. 6d.
A remarkably sensible, clever book. Should be read by all
beginning a working-party, to show them what to do and what not to
do.
660. Black and White. By H. Forde. (S.P.C.K.) 3s. 6d.
Short sketches of home and foreign missions admirably
sandwiched together.
661. Pioneers and Founders. By C. M. Yonge. (Macmillan) 6s.
Brief biographies of English and American missionaries.
662. Life of Henrietta Robertson. By Anne Mackenzie. (Bell) 3s.
6d.
A record of devoted labours in the earlier days of the Zulu mission.
663. The Story of a Fellow-Soldier. By Frances Awdry.
(Macmillan) 2s. 6d.
A short life of Bishop Patteson.
664. An Elder Sister. By Frances Awdry. (Bemrose) 4s. 6d.
The lives of Charles Mackenzie, first Bishop of Zululand, and his
fellow-worker and sister.
665. Our Maoris. By Lady Martin. (S.P.C.K.) 2s. 6d.
Very life-like accounts of work in New Zealand almost from the first
settlement, often droll, always striking, taken from letters written at
the time the events happened.
666. Three Martyrs of the Nineteenth Century. (S.P.C.K.) 3s. 6d.
Short biographies of Dr. Livingstone, Bishop Patteson, and
General Gordon.
667. A Wider World. By Crona Temple. (S.P.C.K.) 1s.
An attempt to show how interest in missionary life enlarges the
whole mind and interest. The execution is not equal to the
conception, but, such as it is, it may be a useful opening of the
subject.
668. New Ground. By C. M. Yonge. (Walter Smith) 3s.
Story of a missionary’s family in Natal chiefly founded on letters
from the Mackenzie family.
669. Life of Bishop Venables of Nassau. By Rev. W. H. F. King.
(Wells Gardner, Darton, & Co.) 3s. 6d.
670. Life of Bishop Field of Newfoundland. By Rev. H. W.
Tucker. (Wells Gardner, Darton, & Co.) 5s.
Brief and very interesting biographies of two noble-hearted
missionary bishops.
671. Ten Years among the Coloured Folk.
This is an American clergyman’s experience among the
emancipated negroes of Baltimore. (A small book, about 2s., can no
doubt be procured through Sampson Low.)
672. Dust Ho! By H. A. Forde. (S.P.C.K.) 2s.
Descriptions of home mission work.
673. Master Missionaries. By Dr. A. H. Japp. (Unwin) 3s. 6d.
The life of General Oglethorpe, with which this begins, is very
curious and interesting. Fit for the educated.
674. Effie and her Ayah. (S.P.C.K.) 1s. 6d.
675. Little Tija. (S.P.C.K.) 1s.
Short studies of Indian child life, suited to a simple audience or
those including children.
676. Alone among the Zulus. (S.P.C.K.) 1s. 6d.
The veritable adventures of a lady, some twenty years ago, when
she went to attend a brother who had fallen sick on a hunting
expedition.
677. Mrs. Poynter’s Missionary Box. (S.P.C.K.) 2d.
May be useful in showing how these can be used.
678. My Two Years in an Indian Mission. By H. F. Blackett.
(S.P.C.K.) 1s. 6d.
A vivid picture of actual mission work by the clergy; full of interest
in both town and country work.
679. Ten Years in Melanesia. By the Rev. Alfred Penny. (Wells
Gardner, Darton, & Co.) 5s.
680. Mission Work in British Guiana. By the Rev. W. G. Brett. 3s.
A delightful book, if only regarded as one of travels.
681. Sketches of Sarawak. By Mrs. Macdougall. (S.P.C.K.) 2s.
6d.
There is unfailing interest in the narrative of the devoted life led by
Bishop and Mrs. Macdougall in the days of Rajah Brooke.
682. Glimpses of Maori Land. By Annie R. Butler. (R.T.S.) 5s.
A delightful tour among the clergy in New Zealand.
683. Klatsassan. By C. S. Brown. (S.P.C.K.) 2s.
Missionary work in British Columbia.
684. Straightforward. By H. A. Forde. (Church Extension Society)
1s.
May be reckoned as properly a tale of adventure; but as it results
in intercourse with the Papuans, it might serve well for a work party
needing something of a story to keep up their attention.
685. Our Navvies. By Mrs. Garnett. (Hodder) 3s. 6d.
An excellent book that should be read whenever it is desirable to
interest people in navvy missions.
686. A Promise Kept. (See No. 572.)
IMPROVING BOOKS.
Under this head are classed those on different countries, on
history, biography, natural history, popular science, and real
adventure.
As has been said before, these are specially suited for prizes, as
they will be read again in after life. For those intended for young
people there will never be any great demand. In almost all lending
libraries they stand still on their shelves with clean pages. We teach
our children too much for them to be willing to learn for themselves.
The appetite may come in after times, and sometimes may exist or
be excited in some particular direction. In cases where young people
are secluded from school by illness, it is desirable on all accounts
that their mental fare should include something besides devotional
books and fiction.
To begin with, the use of maps and the reading lessons at school
make scenes in different countries interesting, and perhaps the
surest books to be appreciated as rewards are those giving pictures
of costumes, &c.

ON DIFFERENT COUNTRIES.
687. All the Russias. By E. C. Phillips. (Cassell) 2s. 6d.
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

textbookfull.com

You might also like