100% found this document useful (6 votes)
49 views

Download ebooks file (Ebook) Parallel Computing Hits the Power Wall: Principles, Challenges, and a Survey of Solutions (SpringerBriefs in Computer Science) by Arthur Francisco Lorenzon; Antonio Carlos Schneider Beck Filho ISBN 9783030287191, 9783030287184, 303028719X, 3030287181 all chapters

The document discusses the ebook 'Parallel Computing Hits the Power Wall' by Arthur Francisco Lorenzon and Antonio Carlos Schneider Beck Filho, which addresses challenges in efficiently utilizing thread-level parallelism in modern multicore systems while managing energy consumption. It outlines various techniques for optimizing performance without significantly increasing energy use, including dynamic voltage and frequency scaling and dynamic concurrency throttling. The book is part of the SpringerBriefs in Computer Science series, which presents concise research summaries across diverse fields.

Uploaded by

vackarso53
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (6 votes)
49 views

Download ebooks file (Ebook) Parallel Computing Hits the Power Wall: Principles, Challenges, and a Survey of Solutions (SpringerBriefs in Computer Science) by Arthur Francisco Lorenzon; Antonio Carlos Schneider Beck Filho ISBN 9783030287191, 9783030287184, 303028719X, 3030287181 all chapters

The document discusses the ebook 'Parallel Computing Hits the Power Wall' by Arthur Francisco Lorenzon and Antonio Carlos Schneider Beck Filho, which addresses challenges in efficiently utilizing thread-level parallelism in modern multicore systems while managing energy consumption. It outlines various techniques for optimizing performance without significantly increasing energy use, including dynamic voltage and frequency scaling and dynamic concurrency throttling. The book is part of the SpringerBriefs in Computer Science series, which presents concise research summaries across diverse fields.

Uploaded by

vackarso53
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

Download the Full Ebook and Access More Features - ebooknice.

com

(Ebook) Parallel Computing Hits the Power Wall:


Principles, Challenges, and a Survey of Solutions
(SpringerBriefs in Computer Science) by Arthur
Francisco Lorenzon; Antonio Carlos Schneider Beck
Filho ISBN 9783030287191, 9783030287184,
https://ebooknice.com/product/parallel-computing-hits-the-
303028719X, 3030287181
power-wall-principles-challenges-and-a-survey-of-solutions-
springerbriefs-in-computer-science-50494066

OR CLICK HERE

DOWLOAD EBOOK

Download more ebook instantly today at https://ebooknice.com


Instant digital products (PDF, ePub, MOBI) ready for you
Download now and discover formats that fit your needs...

Start reading on any device today!

(Ebook) Parallel Computing Hits the Power Wall:


Principles, Challenges, and a Survey of Solutions
(SpringerBriefs in Computer Science) by Arthur Francisco
Lorenzon; Antonio Carlos Schneider Beck Filho ISBN
https://ebooknice.com/product/parallel-computing-hits-the-power-wall-
9783030287191, 9783030287184, 303028719X, 3030287181
principles-challenges-and-a-survey-of-solutions-springerbriefs-in-
computer-science-50494066
ebooknice.com

(Ebook) Biota Grow 2C gather 2C cook by Loucas, Jason;


Viles, James ISBN 9781459699816, 9781743365571,
9781925268492, 1459699815, 1743365578, 1925268497
https://ebooknice.com/product/biota-grow-2c-gather-2c-cook-6661374

ebooknice.com

(Ebook) Matematik 5000+ Kurs 2c Lärobok by Lena


Alfredsson, Hans Heikne, Sanna Bodemyr ISBN 9789127456600,
9127456609
https://ebooknice.com/product/matematik-5000-kurs-2c-larobok-23848312

ebooknice.com

(Ebook) SAT II Success MATH 1C and 2C 2002 (Peterson's SAT


II Success) by Peterson's ISBN 9780768906677, 0768906679

https://ebooknice.com/product/sat-ii-success-
math-1c-and-2c-2002-peterson-s-sat-ii-success-1722018

ebooknice.com
(Ebook) Master SAT II Math 1c and 2c 4th ed (Arco Master
the SAT Subject Test: Math Levels 1 & 2) by Arco ISBN
9780768923049, 0768923042
https://ebooknice.com/product/master-sat-ii-math-1c-and-2c-4th-ed-
arco-master-the-sat-subject-test-math-levels-1-2-2326094

ebooknice.com

(Ebook) Cambridge IGCSE and O Level History Workbook 2C -


Depth Study: the United States, 1919-41 2nd Edition by
Benjamin Harrison ISBN 9781398375147, 9781398375048,
1398375144, 1398375047
https://ebooknice.com/product/cambridge-igcse-and-o-level-history-
workbook-2c-depth-study-the-united-states-1919-41-2nd-edition-53538044

ebooknice.com

(Ebook) Principles of Parallel Scientific Computing: A


First Guide to Numerical Concepts and Programming Methods
(Undergraduate Topics in Computer Science) by Tobias
Weinzierl ISBN 9783030761936, 3030761932
https://ebooknice.com/product/principles-of-parallel-scientific-
computing-a-first-guide-to-numerical-concepts-and-programming-methods-
undergraduate-topics-in-computer-science-38380606
ebooknice.com

(Ebook) Parallel Computational Technologies: 16th


International Conference, PCT 2022, Dubna, Russia, March
29–31, 2022, Revised Selected Papers (Communications in
Computer and Information Science, 1618) by Leonid
https://ebooknice.com/product/parallel-computational-
Sokolinsky, Mikhail Zymbler, (editor) ISBN 9783031116223,
technologies-16th-international-conference-pct-2022-dubna-russia-
march-2931-2022-revised-selected-papers-communications-in-computer-
3031116224
and-information-science-1618-44233260
ebooknice.com

(Ebook) Computer Science Unleashed: Harness the Power of


Computational Systems by Ferreira Filho, Wladston, Pictet,
Moto ISBN 9780997316056, 0997316055
https://ebooknice.com/product/computer-science-unleashed-harness-the-
power-of-computational-systems-38498834

ebooknice.com
SpringerBriefs in Computer Science

Series Editors
Stan Zdonik
Brown University, Providence, RI, USA

Shashi Shekhar
University of Minnesota, Minneapolis, MN, USA

Xindong Wu
University of Vermont, Burlington, VT, USA

Lakhmi C. Jain
University of South Australia, Adelaide, SA, Australia

David Padua
University of Illinois Urbana-Champaign, Urbana, IL, USA

Xuemin Sherman Shen


University of Waterloo, Waterloo, ON, Canada

Borko Furht
Florida Atlantic University, Boca Raton, FL, USA

V. S. Subrahmanian
Department of Computer Science, University of Maryland, College Park,
MD, USA

Martial Hebert
Carnegie Mellon University, Pittsburgh, PA, USA

Katsushi Ikeuchi
Meguro-ku, University of Tokyo, Tokyo, Japan
Bruno Siciliano
Dipartimento di Ingegneria Elettrica e delle Tecnologie
dell’Informazione, Università di Napoli Federico II, Napoli, Italy

Sushil Jajodia
George Mason University, Fairfax, VA, USA

Newton Lee
Institute for Education Research and Scholarships, Los Angeles, CA, USA

SpringerBriefs present concise summaries of cutting-edge research and


practical applications across a wide spectrum of fields. Featuring
compact volumes of 50 to 125 pages, the series covers a range of
content from professional to academic.
Typical topics might include:
A timely report of state-of-the art analytical techniques
A bridge between new research results, as published in journal
articles, and a contextual literature review
A snapshot of a hot or emerging topic
An in-depth case study or clinical example
A presentation of core concepts that students must understand in
order to make independent contributions
Briefs allow authors to present their ideas and readers to absorb them
with minimal time investment. Briefs will be published as part of
Springer’s eBook collection, with millions of users worldwide. In
addition, Briefs will be available for individual print and electronic
purchase. Briefs are characterized by fast, global electronic
dissemination, standard publishing contracts, easy-to-use manuscript
preparation and formatting guidelines, and expedited production
schedules. We aim for publication 8–12 weeks after acceptance. Both
solicited and unsolicited manuscripts are considered for publication in
this series.
More information about this series at http://​www.​springer.​com/​
series/​10028
Arthur Francisco Lorenzon and Antonio Carlos Schneider Beck Filho

Parallel Computing Hits the Power Wall


Principles, Challenges, and a Survey of Solutions
Arthur Francisco Lorenzon
Department of Computer Science, Federal University of Pampa
(UNIPAMPA), Alegrete, Rio Grande do Sul, Brazil

Antonio Carlos Schneider Beck Filho


Institute of Informatics, Campus do Vale, Federal University of Rio
Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, Brazil

ISSN 2191-5768 e-ISSN 2191-5776


SpringerBriefs in Computer Science
ISBN 978-3-030-28718-4 e-ISBN 978-3-030-28719-1
https://doi.org/10.1007/978-3-030-28719-1

© The Author(s), under exclusive license to Springer Nature


Switzerland AG 2019

This work is subject to copyright. All rights are solely and exclusively
licensed by the Publisher, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in
any other physical way, and transmission or information storage and
retrieval, electronic adaptation, computer software, or by similar or
dissimilar methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks,


service marks, etc. in this publication does not imply, even in the
absence of a specific statement, that such names are exempt from the
relevant protective laws and regulations and therefore free for general
use.

The publisher, the authors, and the editors are safe to assume that the
advice and information in this book are believed to be true and accurate
at the date of publication. Neither the publisher nor the authors or the
editors give a warranty, express or implied, with respect to the material
contained herein or for any errors or omissions that may have been
made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer


Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham,
Switzerland
This book is dedicated to the memory of Márcia Cristina and Aurora Cera.
Preface
Efficiently exploiting thread-level parallelism from modern multicore
systems has been challenging for software developers. While blindly
increasing the number of threads may lead to performance gains, it can
also result in a disproportionate increase in energy consumption. In the
same way, optimization techniques for reducing energy consumption,
such as DVFS and power gating, can lead to huge performance loss if
used incorrectly. In this book, we present and discuss several
techniques that address these challenges. We start by providing a brief
theoretical background on parallel computing in software and the
sources of power consumption. Then, we show how different parallel
programming interfaces and communication models may affect energy
consumption in different ways. Next, we discuss tuning techniques to
adapt the number of threads/operating frequency to achieve the best
compromise between performance and energy. We finish this book
with a detailed analysis of a representative example of an adaptive
approach. Alegrete, Brazil Arthur Francisco Lorenzon Porto Alegre,
Brazil Antonio Carlos Schneider Beck Filho
Arthur Francisco Lorenzon
Antonio Carlos Schneider Beck Filho
Alegrete, Brazil, Porto Alegre, Brazil
Acknowledgments
The authors would like to thank the friends and colleagues at
Informatics Institute of the Federal University of Rio Grande do Sul and
give a special thanks to all the people in the Embedded Systems
Laboratory, who have contributed to this research since 2013.
The authors would also like to thank the Brazilian research support
agencies, FAPERGS, CAPES, and CNPq.
Acronyms
CMOS Complementary metal oxide semiconductor
DCT Dynamic concurrency throttling
DSE Design space exploration
DVFS Dynamic voltage and frequency scaling
EDP Energy-delay product
FFT Fast fourier transform
FIFO First-in first-out
FU Function unit
GPP General-purpose processors
HC High communication
HPC High-performance computing
ILP Instruction-level parallelism
ISA Instruction set architecture
LC Low communication
MPI Message passing interface
OpenMP Open multi-programming
PAPI Performance application programming interface
PPI Parallel programming interface
PThreads POSIX threads
RAM Random access memory
SMT Simultaneous multithreading
SoC System-on-chip
TBB Threading building blocks
TDP Thermal design power
TLP Thread-level parallelism
Contents
1 Runtime Adaptability:​The Key for Improving Parallel
Applications
1.​1 Introduction
1.​2 Scalability Analysis
1.​2.​1 Variables Involved
1.​3 This Book
2 Fundamental Concepts
2.​1 Parallel Computing in Software
2.​1.​1 Communication Models
2.​1.​2 Parallel Programming Interfaces
2.​1.​3 Multicore Architectures
2.​2 Power and Energy Consumption
2.​2.​1 Dynamic Voltage and Frequency Scaling
2.​2.​2 Power Gating
3 The Impact of Parallel Programming Interfaces on Energy
3.​1 Methodology
3.​1.​1 Benchmarks
3.​1.​2 Multicore Architectures
3.​1.​3 Execution Environment
3.​1.​4 Setup
3.​2 Results
3.​2.​1 Performance and Energy Consumption
3.​2.​2 Energy-Delay Product
3.​2.​3 The Influence of the Static Power Consumption
3.​3 Discussion
4 Tuning Parallel Applications
4.​1 Design Space Exploration of Optimization Techniques
4.​2 Dynamic Concurrency Throttling
4.​2.​1 Approaches with no Runtime Adaptation and no
Transparency
4.​2.​2 Approaches with Runtime Adaptation and/​or
Transparency
4.​3 Dynamic Voltage and Frequency Scaling
4.​3.​1 Approaches with no Runtime Adaptation and no
Transparency
4.​3.​2 Approaches with Runtime Adaptation and/​or
Transparency
4.​4 DCT and DVFS
4.​4.​1 Approaches with no Runtime Adaptation and no
Transparency
4.​4.​2 Approaches with Runtime Adaptation and/​or
Transparency
Case Study:​DCT with Aurora
5.​1 The Need for Adaptability and Transparency
5.​2 Aurora:​Seamless Optimization of OpenMP Applications
5.​2.​1 Integration to OpenMP
5.​2.​2 Search Algorithm
5.​3 Evaluation of Aurora
5.​3.​1 Methodology
5.​3.​2 Results
Conclusions
References
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2019
A. Francisco Lorenzon, A. C. S. Beck Filho, Parallel Computing Hits the Power Wall,
SpringerBriefs in Computer Science
https://doi.org/10.1007/978-3-030-28719-1_1

1. Runtime Adaptability: The Key for


Improving Parallel Applications
Arthur Francisco Lorenzon1 and Antonio Carlos Schneider Beck
Filho2

(1) Department of Computer Science, Federal University of Pampa


(UNIPAMPA), Alegrete, Rio Grande do Sul, Brazil
(2) Institute of Informatics, Campus do Vale, Federal University of Rio
Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, Brazil

1.1 Introduction
With the increasing complexity of parallel applications, which require
more computing power, energy consumption has become an important
issue. The power consumption of high-performance computing (HPC)
systems is expected to significantly grow (up to 100 MW) in the next
years [34]. Moreover, while general-purpose processors are being
pulled back by the limits of the thermal design power (TDP), most of
the embedded devices are mobile and heavily dependent on battery
(e.g., smartphones and tablets). Therefore, the primary objective when
designing and executing parallel applications is not to merely improve
performance but to do so with minimal impact on energy consumption.
Performance improvements can be achieved by exploiting
instruction-level parallelism (ILP) or thread-level parallelism (TLP). In
the former, independent instructions of a single program are
simultaneously executed, usually on a superscalar processor, as long as
there are functional units available. However, typical instruction
streams have only a limited amount of parallelism [122], resulting in
considerable efforts to design a microarchitecture that will bring only
marginal performance gains with very significant area/power
overhead. Even if one considers a perfect processor, ILP exploitation
will reach an upper bound [85].
Hence, to continue increasing performance and to provide better
use of the extra available transistors, modern designs have started to
exploit TLP more aggressively [7]. In this case, multiple processors
simultaneously execute parts of the same program, exchanging data at
runtime through shared variables or message passing. In the former, all
threads share the same memory region, while in the latter each process
has its private memory, and the communication occurs by send/receive
primitives (even though they are also implemented using a shared
memory context when the data exchange is done intra-chip [21]).
Regardless of the processor or communication model, data exchange is
usually done through memory regions that are more distant from the
processor (e.g., L3 cache and main memory) and have higher delay and
power consumption when compared to memories that are closer to it
(e.g., register, L1, and L2 caches).
Therefore, even though execution time shall decrease because of
TLP exploitation, energy will not necessarily follow the same trend,
since many other variables are involved:
Memories that are more distant from the processor will be more
accessed for synchronization and data exchange, increasing energy
related to dynamic power (which increases as there is more activity
in the circuitry).
A parallel application will usually execute more instructions than its
sequential counterpart. Moreover, even considering an ideal scenario
(where processors are put on standby with no power consumption),
the sum of the execution times of all threads executing on all cores
tends to be greater than if the application was sequentially executed
on only one core. In consequence, the resulting energy from static
power (directly proportional to how long each hardware component
is turned on) consumed by the cores will also be more significant.
There are few exceptions to this rule, such as non-deterministic
algorithms, in which the execution of a parallel application may
execute fewer instructions than its sequential counterpart.
The memory system (which involves caches and main memory) will
be turned on for a shorter time (the total execution time of the
applications), which will decrease the energy resulting from the
static power.
Given the aforementioned discussion, cores tend to consume more
energy from both dynamic and static power, while memories will
usually spend more dynamic power (and hence energy), but also tend
to save static power, which is very significant [121]. On top of that,
neither performance nor energy improvements resultant from TLP
exploitation are linear, and sometimes they do not scale as the number
of threads increases, which means that in many cases the maximum
number of threads will not offer the best results.
On top of that, in order to speed up the development process of TLP
exploitation and make it as transparent as possible to the software
developer, different parallel programming interfaces are used (e.g.,
OpenMP—Open Multi-Processing [22], PThreads—POSIX Threads [17],
or MPI—Message Passing Interface [38]). However, each one of these
has different characteristics with respect to the management (i.e.,
creation and finalization of threads/processes), workload distribution,
and synchronization.
In addition to the complex scenario of thread scalability, several
optimization techniques for power and energy management can be
used, such as dynamic voltage and frequency scaling (DVFS) [62] and
power gating [47]. The former is a feature of the processor that allows
the application to adapt the clock frequency and operating voltage of
the processor on the fly. It enables software to change the processing
performance to attain low-power consumption while meeting the
performance requirements [62]. On the other hand, power gating
consists of selectively powering down certain blocks in the chip while
keeping other blocks powered up. In multicore processors, it switches
off unused cores to reduce power consumption [84]. Therefore, in
addition to selecting the ideal number of threads to execute an
application, choosing the optimal processor frequency and turning off
cores unused during the application execution may lead to significant
reduction in energy consumption with minimal impact on performance.
1.2 Scalability Analysis
Many works have associated the fact that executing an application with
the maximum possible number of available threads (the common
choice for most software developers [63]) will not necessarily lead to
the best possible performance. There are several reasons for this lack of
scalability: instruction issue-width saturation; off-chip bus saturation;
data-synchronization; and concurrent shared memory accesses [51, 64,
95, 114, 115]. In order to measure (through correlation) their real
influence, we have executed four benchmarks from our set (and used
them as examples in the next subsections) on a 12-core machine with
SMT support. Each one of them has one limiting characteristic that
stands out, as shown in Table 1.1. The benchmark hotspot (HS)
saturates the issue-width; fast Fourier transform (FFT), the off-chip
bus; MG, the shared memory accesses; and N-body (NB) saturates data-
synchronization. To analyze each of the scalability issues, we
considered the Pearson correlation [9]. It takes a range of values from
+ 1 to − 1: the stronger the “r” linear association between two variables,
the closer the value will be to either + 1 or − 1. r ≥ 0.9 or r ≤−0.9 means a
very strong correlation (association is directly or inversely
proportional). We discuss these bottlenecks next.

Table 1.1 Pearson correlation between the scalability issues and each benchmark

HS FFT MG NB
Issue-width saturation − 0.92 − 0.71 − 0.79 − 0.78
Off-chip bus saturation − 0.51 − 0.98 − 0.76 0.46
Shared memory accesses − 0.52 − 0.43 − 0.96 0.80
Data-synchronization − 0.54 − 0.50 − 0.59 0.97

Issue-Width Saturation
SMT allows many threads to run simultaneously on a core. It increases
the probability of having more independent instructions to fill the
function units (FUs). Although it may work well for applications with
low ILP, it can lead to the opposite behavior if an individual thread
presents enough ILP to issue instructions to all or most of the core’s
FUs. Then, SMT may lead to resource competition and functional unit
contention, resulting in extra idle cycles. Figure 1.1a shows the
performance speedup relative to the sequential version and the
number of idle cycles (average, represented by the bars, and total) as
we increase the number of threads for the HS application. As we start
executing with 13 threads, two will be mapped to the same physical
core, activating SMT. From this point on, as the number of threads
grows, the average number of idle cycles increases by a small amount
or stays constant. However, the total number of idle cycles significantly
increases. Because this application has high ILP, there are not enough
resources to execute both threads concurrently as if each one was
executed on a single core. They become the new critical path of that
parallel region, as both threads will delay the execution of the entire
parallel region (threads can only synchronize when all have reached the
barrier). Therefore, performance drops and is almost recovered only
with the maximum number of threads executing. In the end, extra
resources are being used without improving performance and
potentially increasing energy consumption, decreasing resource
efficiency.

Fig. 1.1 Scalability behavior of parallel applications. (a) Issue-width saturation. (b)
Off-chip bus saturation

Off-Chip Bus Saturation


Many parallel applications operate on huge amounts of data that are
private to each thread and have to be constantly fetched from the main
memory. In this scenario, the off-chip bus that connects memory and
processor plays a decisive role in thread scalability: as each thread
computes on different data blocks, the demand for off-chip bus
increases linearly with the number of threads. However, the bus
bandwidth is limited by the number of I/O pins, which does not
increase according to the number of cores [41]. Therefore, when the
off-chip bus saturates, no further improvements are achieved by
increasing the number of threads [115]. Figure 1.2b shows the FFT
execution as an example. As the number of threads increases, execution
time and energy consumption reduce until the off-chip bus becomes
completely saturated (100% of utilization). From this point on (4
threads), increasing the number of threads does not improve
performance, as the bus cannot deliver all the requested data. There
might be an increase in energy consumption as well since many
hardware components will stay active while the cores are not being
properly fed with data.

Fig. 1.2 Scalability behavior of MG benchmark—shared memory accesses

Shared Memory Accesses Threads communicate by accessing data


that are located in shared memory regions, which are usually more
distant from the processor (e.g., L3 cache and main memory), so they
can also become a bottleneck. Figure 1.2 presents the number of
accesses to the L3 cache (the only cache level shared among the cores)
in the primary y-axis and the execution time normalized to the
sequential execution in the secondary y-axis for the MG benchmark.
When the application executes with more than four threads, the
performance is highly influenced by the increased number of accesses
to L3. Other factors may also influence L3 performance: thread
scheduling, data affinity, or the intrinsic characteristics of the
application. For instance, an application with a high rate of private
accesses to L1 and L2 may also lead to an increase in the L3 accesses.
Moreover, part of the communication may be hidden from the L3 when
SMT is enabled: two threads that communicate and are executing on
the same SMT core may not need to share data outside it.

Data-Synchronization
Synchronization operations ensure data integrity during the execution
of a parallel application. In this case, critical sections are implemented
to guarantee that only one thread will execute a given region of code at
once, and therefore data will correctly synchronize. In this way, all code
inside a critical section must be executed sequentially. Therefore, when
the number of threads increases, more threads must be serialized
inside the critical sections. It also increases the synchronization time
(Fig. 1.3a), potentially affecting the execution time and energy
consumption of the whole application. Figure 1.3b shows this behavior
for the n-body benchmark. While it executes with 4 threads or less, the
performance gains within the parallel region reduce the execution time
and energy consumption, even if the time spent in the critical region
increases (Fig. 1.3a). However, from this point on, the time the threads
spend synchronizing overcomes the speedup achieved in the parallel
region.
Fig. 1.3 Data-synchronization. (a) Critical section behavior. (b) Perf./Energy
degradation

1.2.1 Variables Involved


Considering the prior scenario, choosing the right number of threads to
a given application will offer opportunities to improve performance and
increase the energy efficiency. However, such task is extremely difficult:
besides the huge number of variables involved, many of them will
change according to different aspects of the system at hand and are
only possible to be defined at runtime, such as:
Input set: As shown in Fig. 1.4a, different levels of performance
improvements for the LULESH benchmark [57] (also used as
examples in the next two items) over its single-threaded version are
reached with a different number of threads (x-axis). However, these
levels vary according to the input set (small or medium). While the
best number of threads is 12 for the medium input set, the ideal
number for the small set is 11.
Fig. 1.4 Appropriate number of threads (x-axis) considering the improvements
over sequential version (y-axis). (a) Different input sets. (b) Different metrics
evaluated. (c) Different multicore processors. (d) Different parallel regions
Metric evaluated: As Fig. 1.4b shows, the best performance is reached
with 12 threads, while 6 threads bring the lowest energy
consumption, and 9 presents the best trade-off between both metrics
(represented by the energy-delay product (EDP)).
Processor architecture: Fig. 1.4c shows that the best EDP
improvements of the parallel application on a 32-core system are
when it executes with 11 threads. However, the best choice for a 24-
core system is 9 threads.
Parallel regions: Many applications are divided into several parallel
regions, in which each of these regions may have a distinct ideal
number of threads, since their behavior may vary as the application
executes. As an example, Fig. 1.4d shows the behavior of four parallel
regions from the Poisson equation benchmark [94] when running on
a 24-core system. One can note that each parallel region is better
executed with a different number of threads.
Application behavior: A DVFS enabled system adapts the operating
frequency and voltage at runtime according to the application at
hand, taking advantage of the processor idleness (usually provoked
by I/O operations or by memory requests). Therefore, a memory- or
CPU-bound application will influence the DVFS at different levels.

1.3 This Book


Efficiently exploiting thread-level parallelism from new multicore
systems has been challenging for software developers. While blindly
increasing the number of threads may lead to performance gains, it can
also result in a disproportionate increase in energy consumption. In the
same way, optimization techniques for reducing energy consumption,
such as DVFS and power gating, can lead to huge performance loss if
used incorrectly. For this reason, rightly choosing the number of
threads, the operating processor frequency, and the number of active
cores is essential to reach the best compromise between performance
and energy. However, such task is extremely difficult: besides the large
number of variables involved, many of them will change according to
different aspects of the system at hand and are defined at runtime, such
as the input set of the application, the metric evaluated, the processor
microarchitecture, and the behavior of the parallel regions that
comprise the application.
In this book, we present and discuss several techniques that address
this challenge.
In Chap. 2, we provide a brief background for the reader. First, we
give an overview of parallel computing in software, presenting the
parallel programming interfaces widely used in multicore architectures.
Second, we present the techniques used in software and hardware to
optimize the power and energy consumption of parallel applications.
Then, we describe the design space exploration of the optimization of
parallel applications.
Chapter 3 assesses the influence of the parallel programming
interfaces that exploit parallelism through shared variables (OpenMP
and PThreads) or message passing (MPI-1 and MPI-2) on the behavior
of parallel applications with different communication demands for
embedded and general-purpose processors.
Chapter 4 presents the works that aim to optimize the execution of
parallel applications by tuning the number of threads (DCT) or by
selecting the ideal processor operating frequency through DVFS. We
have conducted an extensive research considering studies published in
the main conferences and journals over the past fifteen years. In this
sense, more than fifty works were analyzed and classified into three
classes according to the optimization method: only DCT, only DVFS, and
the ones that apply both techniques.
Finally, in Chap. 5, we present in details, as a case study, Aurora,
which is a new OpenMP framework that optimizes the performance,
energy, or EDP of parallel applications by tuning the number of threads
at runtime without any interference from the software developer.

References
7. Beck, A.C.S., Lisbô a, C.A.L., Carro, L.: Adaptable Embedded Systems. Springer,
Berlin (2012)
9.
Benesty, J., Chen, J., Huang, Y., Cohen, I.: Pearson Correlation Coefficient, pp. 1–4.
Springer, Berlin, (2009). https://​doi.​org/​10.​1007/​978-3-642-00296-0_​5
17.
Butenhof, D.R.: Programming with POSIX Threads. Addison-Wesley Longman
Publishing, Boston (1997)
21.
Chandramowlishwaran, A., Knobe, K., Vuduc, R.: Performance evaluation of
concurrent collections on high-performance multicore computing systems. In:
2010 IEEE International Symposium on Parallel Distributed Processing
(IPDPS), pp. 1–12. IEEE, Piscataway (2010). https://​doi.​org/​10.​1109/​I PDPS.​
2010.​5470404
22.
Chapman, B., Jost, G., Pas, R.v.d.: Using OpenMP: Portable Shared Memory
Parallel Programming (Scientific and Engineering Computation). MIT Press,
Cambridge, MA (2007)
34.
Dutot, P.F., Georgiou, Y., Glesser, D., Lefevre, L., Poquet, M., Rais, I.: Towards
energy budget control in HPC. In: IEEE/ACM International Symposium on
Cluster, Cloud and Grid Computing, pp. 381–390. IEEE, Piscataway (2017)
38.
Gropp, W., Lusk, E., Skjellum, A.: Using MPI (2Nd Ed.): Portable Parallel
Programming with the Message-passing Interface. MIT Press, Cambridge (1999)
[Crossref]
41.
Ham, T.J., Chelepalli, B.K., Xue, N., Lee, B.C.: Disintegrated control for energy-
efficient and heterogeneous memory systems. In: IEEE HPCA, pp. 424–435.
IEEE, Picataway (2013). https://​doi.​org/​10.​1109/​H PCA.​2013.​6522338
47.
Hu, Z., Buyuktosunoglu, A., Srinivasan, V., Zyuban, V., Jacobson, H., Bose, P.:
Microarchitectural techniques for power gating of execution units. In:
Proceedings of the 2004 International Symposium on Low Power Electronics
and Design, ISLPED ’04, pp. 32–37. ACM, New York (2004). https://​doi.​org/​10.​
1145/​1013235.​1013249
51.
Joao, J.A., Suleman, M.A., Mutlu, O., Patt, Y.N.: Bottleneck identification and
scheduling in multithreaded applications. In: International Conference on
Architectural Support for Programming Languages and Operating Systems, pp.
223–234. ACM, New York (2012). https://​doi.​org/​10.​1145/​2150976.​2151001
57.
Karlin, I., Keasler, J., Neely, R.: Lulesh 2.0: updates and changes. pp. 1–9 (2013)
62.
Le Sueur, E., Heiser, G.: Dynamic voltage and frequency scaling: the laws of
diminishing returns. In: Proceedings of the 2010 International Conference on
Power Aware Computing and Systems, HotPower’10, pp. 1–8. USENIX
Association, Berkeley (2010)
63.
Lee, J., Wu, H., Ravichandran, M., Clark, N.: Thread tailor: dynamically weaving
threads together for efficient, adaptive parallel applications. ACM SIGARCH
Comput. Archit. News 38(3), 270–279 (2010)
[Crossref]
64.
Levy, H.M., Lo, J.L., Emer, J.S., Stamm, R.L., Eggers, S.J., Tullsen, D.M.: Exploiting
choice: Instruction fetch and issue on an implementable simultaneous
multithreading processor. In: International Symposium on Computer
Architecture, pp. 191–191 (1996). https://​doi.​org/​10.​1145/​232973.​232993
84.
Oboril, F., Tahoori, M.B.: Extratime: Modeling and analysis of wearout due to
transistor aging at microarchitecture-level. In: IEEE/IFIP International
Conference on Dependable Systems and Networks (DSN 2012), pp. 1–12 (2012).
https://​doi.​org/​10.​1109/​DSN.​2012.​6263957
85.
Olukotun, K., Hammond, L.: The future of microprocessors. Queue 3(7), 26–29
(2005). https://​doi.​org/​10.​1145/​1095408.​1095418
[Crossref]
94.
Quinn, M.: Parallel Programming in C with MPI and OpenMP. McGraw-Hill
Higher Education (2004)
95.
Raasch, S.E., Reinhardt, S.K.: The impact of resource partitioning on SMT
processors. In: International Conference on Parallel Architectures and
Compilation Techniques, pp. 15–25 (2003). https://​doi.​org/​10.​1109/​PACT.​2003.​
1237998
114.
Subramanian, L., Seshadri, V., Kim, Y., Jaiyen, B., Mutlu, O.: Mise: Providing
performance predictability and improving fairness in shared main memory
systems. In: IEEE International Symposium on High Performance Computer
Architecture, pp. 639–650 (2013)
115.
Suleman, M.A., Qureshi, M.K., Patt, Y.N.: Feedback-driven threading: power-
efficient and high-performance execution of multi-threaded workloads on
CMPS. SIGARCH Comput. Archit. News 36(1), 277–286 (2008). https://​doi.​org/​
10.​1145/​1353534.​1346317
[Crossref]
121.
Vogelsang, T.: Understanding the energy consumption of dynamic random
access memories. In: Proceedings of the 2010 43rd Annual IEEE/ACM
International Symposium on Microarchitecture, MICRO ’43, pp. 363–374. IEEE
Computer Society, Washington (2010). https://​doi.​org/​10.​1109/​MICRO.​2010.​42
122.
Wall, D.W.: Limits of instruction-level parallelism. In: Proceedings of the Fourth
International Conference on Architectural Support for Programming Languages
and Operating Systems, ASPLOS IV, pp. 176–188. ACM, New York (1991).
shttps://​doi.​org/​10.​1145/​106972.​106991
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2019
A. Francisco Lorenzon, A. C. S. Beck Filho, Parallel Computing Hits the Power Wall,
SpringerBriefs in Computer Science
https://doi.org/10.1007/978-3-030-28719-1_2

2. Fundamental Concepts
Arthur Francisco Lorenzon1 and Antonio Carlos Schneider Beck
Filho2

(1) Department of Computer Science, Federal University of Pampa


(UNIPAMPA), Alegrete, Rio Grande do Sul, Brazil
(2) Institute of Informatics, Campus do Vale, Federal University of Rio
Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, Brazil

2.1 Parallel Computing in Software


Parallel programming can be defined as the process of dividing tasks of
an application that can be executed concurrently, aiming to reduce their
total execution time [97]. It has been widely used in the development of
scientific applications that require large computing power, such as
weather forecasting calculations, DNA sequences, and genome
calculation. Moreover, with the popularization of multicore
architectures, general-purpose applications (e.g., graphics editors and
web servers) have also been parallelized.
The main goal of parallel computing is to use multiple processing
units for solving problems in less time [36]. The key for parallel
computing is the possibility to exploit concurrency of a given
application by decomposing a problem into sub-problems that can be
executed at the same time. As a simple example, suppose that part of an
application involves computing the summation of a large set of values.
In a sequential execution, all the values are added together in only one
core, sequentially, as depicted in Fig. 2.1a. On the other hand, with the
parallel computing, the data set can be partitioned, and the summations
computed simultaneously, each on a different processor (C0, C1, C2, and
C3, in Fig. 2.1b). Then, the partial sums are combined to get the final
answer.

Fig. 2.1 Example of parallel computing. (a) Sequential execution. (b) Parallel
execution in four cores

2.1.1 Communication Models


Parallel computing exploits the use of multiple processing units to
execute parts of the same program simultaneously. Thus, there is
cooperation between the processors that execute concurrently.
However, for this cooperation to occur, processors should exchange
information at runtime. In multicore processors, this can be done
through shared variables or message passing [97]:
Shared variable is based on the existence of an address space in
the memory that can be accessed by all processors. It is widely used
when parallelism is exploited at the level of the thread since they share
the same memory address space. In this model, the threads can have
private variables (the thread has exclusive access) and shared variables
(all the threads have access). When the threads need to exchange
information between them, they use shared variables located in
memory regions that are accessed by all threads (shared memory).
Each parallel programming interface provides synchronization
operations to control the access to shared variables, avoiding race
conditions.
Message passing is used in environments where memory space is
distributed or where processes do not share the same memory address
space. Therefore, communication occurs using send/receive operations
which can be point-to-point or collective ones. In the first, data
exchange is done between pairs of processes. In the latter, more than
two processes are communicating.

2.1.2 Parallel Programming Interfaces


The development of applications that can exploit the full potential
parallelism of multiprocessor architectures depends on many specific
aspects of their organization, including the size, structure, and
hierarchy of the memory. Operating Systems provide transparency
concerning the allocation and scheduling of different processes across
the various cores. However, when it comes to TLP exploitation, which
involves the division of the application into threads or processes, the
responsibility is of the programmer. Therefore, PPIs make the
extraction of the parallelism easier, fast, and less error-prone. Several
parallel programming interfaces are used nowadays, in which the most
common are Open Multi-Processing (OpenMP), POSIX Threads
(PThreads), Message Passing Interface (MPI), Threading Building
Blocks (TBB), Cilk Plus, Charm, among others.
OpenMP is a PPI for shared memory in C/C+ + and FORTRAN that
consists of a set of compiler directives, library functions, and
environment variables [22]. Parallelism is exploited through the
insertion of directives in the sequential code that inform the compiler
how and which parts of the code should be executed in parallel. The
synchronization can be implicit (implied barrier at the end of a parallel
region) or explicit (synchronization constructs) to the programmer. By
default, whenever there is a synchronization point, OpenMP threads
enter in a hybrid state (Spin-lock and Sleep), i.e., they access the shared
memory repeatedly until the number of spins of the busy-wait loop is
achieved (Spin-lock), and then, they enter into a sleep state until the
end of synchronization [22]. The amount of time that each thread waits
actively before waiting passively without consuming CPU power may
vary according to the waiting policy that gives the number of spins of
the busy-wait loop (e.g., the standard value when omp wait policy is set
to being active is 30 billion iterations) [86].
PThreads is a standard PPI for C/C+ +, where functions allow fine
adjustment in the grain size of the workload. Thus, the
creation/termination of the threads, the workload distribution, and the
control of execution are defined by the programmer [17]. PThreads
synchronization is done by blocking threads with mutexes, which are
inserted in the code by the programmer. In this case, threads lose the
processor and wait on standby until the end of the synchronization,
when they are rescheduled for execution [117].
Cilk Plus is integrated with a C/C+ + compiler and extends the
language with the addition of keywords by the programmer indicating
where parallelism is allowed. Cilk Plus enables programmers to
concentrate on structuring programs to expose parallelism and exploit
locality. Thus, the runtime system has the responsibility of scheduling
the computation to run efficiently on a given platform. Besides, it takes
care of details like load balancing, synchronization, and communication
protocols. Unlike PThreads and OpenMP, Cilk Plus works at a finer
grain, with a runtime system that is responsible for efficient execution
and predictable performance [79].
TBB is a library that supports parallelism based on a tasking model
and can be used with any C+ + compiler. TBB requires the use of
function objects to specify blocks of code to run in parallel, which relies
on templates and generic programming. The synchronization between
threads is done by mutual exclusion, in which the threads in this state
perform busy-waiting until the end of synchronization [79].
MPI is a standard message passing library for C/C+ + and FORTRAN.
It implements an optimization mechanism to provide communication
in shared memory environments [38]. MPI is like PThreads regarding
the explicit exploitation of parallelism. Currently, it is divided into three
norms. In MPI-1, all processes are created at the beginning of the
execution and the number of processes does not change throughout
program execution. In MPI-2, the creation of the processes occurs at
runtime, and the number of processes can change during the execution.
In MPI-3, the updates include the extension of collective operations to
include nonblocking versions and extensions to the one-sided
operations. Communication between MPI processes occurs through
send/receive operations (point-to-point or collective ones), which are
likewise explicitly handled by the programmers. When MPI programs
are executed on shared memory architectures, message transmissions
can be done as shared memory accesses, in which messages are broken
into fragments that are pushed and popped in first-in first-out (FIFO)
queues of each MPI process [16, 21].

2.1.3 Multicore Architectures


Multicore architectures have multiple processing units (cores) and a
memory system that enables communication between the cores. Each
core is an independent logical processor with its resources, such as
functional units, pipeline execution, registers, among others. The
memory system consists of private memories, which are closer to the
core and only accessible by a single core, and shared memories, which
are more distant from the core and can be accessed by multiple cores
[43]. Figure 2.2 shows an example of a multicore architecture with four
cores (C0, C1, C2, and C3) and its private (L1 and L2 caches) and shared
memories (L3 cache and main memory).

Fig. 2.2 Basic structure of a multicore architecture with four cores

Multicore processors can exploit TLP. In this case, multiple


processors simultaneously execute parts of the same program,
exchanging data at runtime through shared variables or message
passing. Regardless of the processor or communication model, data
exchange is done through load/store instructions in shared memory
regions. As Fig. 2.2 shows, these regions are more distant from the
processor (e.g., L3 cache and main memory), and have a higher delay
and power consumption when compared to memories that are closer to
it (e.g., register, L1, and L2 caches) [61].
Among the challenges faced in the design of multicore architectures,
one of the most important is related to the data access on parallel
applications. When a private data is accessed, its location is migrated to
the private cache of a core, since no other core will use the same
variable. On the other hand, shared data may be replicated in multiple
caches, since other processors can access it to communicate. Therefore,
while sharing data improves concurrency between multiple processors,
it also introduces the cache coherence problem: when a processor
writes on any shared data, the information stored in other caches may
become invalid. In order to solve this problem, cache coherence
protocols are used.
Cache coherence protocols are classified into two classes: directory
based and snooping [88]. In the former, a centralized directory
maintains the state of each block in different caches. When an entry is
modified, the directory is responsible for either updating or
invalidating the other caches with that entry. In the snooping protocol,
rather than keeping the state of sharing block in a single directory, each
cache that has a copy of the data can track the sharing status of the
block. Thus, all the cores observe memory operations and take proper
action to update or invalidate the local cache content if needed.
Cache blocks are classified into states, in which the number of states
depends on the protocol. For instance, directory based and snooping
protocols are simple three-state protocols in which each block is
classified into modified, shared, and invalid (they are often called as
MSI—modified, shared, and invalid—protocol). When a cache block is
in the modified state, it has been updated in the private cache, and
cannot be in any other cache. The shared state indicates that the block
in the private cache is potentially shared, and the cache block is invalid
when a block contains no valid data. Based on the MSI protocol,
extensions have been created by adding additional states. There are
two common extensions: MESI, which adds the state “exclusive” to the
MSI to indicate when a cache block is resident only in a single cache but
is clean, and MOESI, which adds the “state-owned” to the MESI protocol
to indicate that a particular cache owns the associated block and out-of-
date in memory [43].
When developing parallel applications, the software developer does
not need to know about all details of cache coherence. However,
knowing how the data exchange occurs at the hardware level can help
the programmer to make better decisions during the development of
parallel applications.

2.2 Power and Energy Consumption


Two main components constitute the power used by a CMOS integrated
circuit: dynamic and static [58]. The former is the power consumed
while the inputs are active, with capacitances charging and discharging,
which is directly proportional to the circuit switching activity, given by
Eq. (2.1).

(2.1)

Capacitance (C) depends on the wire lengths of on-chip structures.


The designers in several ways can influence this metric. For example,
building two smaller cores on-chip, rather than one large, is likely to
reduce average wire lengths, since most wires will interconnect units
within a single core.
Supply voltage (V or Vdd) is the main voltage to power the
integrated circuit. Because of its direct quadratic influence on dynamic
power, supply voltage has a high importance on power-aware design.
Activity factor (A) refers to how often clock ticks lead to switching
activity on average.
Clock frequency (f) has a fundamental impact on power dissipation
because it indirectly influences supply voltage: the higher clock
frequencies can require a higher supply voltage. Thus, the combined
portion of supply voltage and clock frequency in the dynamic power
equation has a cubic impact on total power dissipation.
While dynamic power dissipation represents the predominant
factor in CMOS power consumption, static power has been increasingly
prominent in recent technologies. The static power essentially consists
of the power used when the transistor is not in the process of switching
and is determined by Eq. (2.2), where the supply voltage is V, and the
total current flowing through the device is I static.

(2.2)
Energy, in joules, is the integral of total power consumed (P) over
the time (T), given by Eq. (2.3).

(2.3)

Currently, energy is considered one of the most fundamental


metrics due to the energy restrictions: while most of the embedded
devices are mobile and heavily dependent on battery, general-purpose
processors are being pulled back by the limits of thermal design power.
Also, the reduction of energy consumption on HPC is one of the
challenges to achieving the Exascale era, since the actual energy
required to maintain these systems corresponds to the power from a
nuclear plant of medium size [34]. Therefore, several techniques to
reduce energy consumption have been proposed, such as DVFS and
power gating.

2.2.1 Dynamic Voltage and Frequency Scaling


Dynamic voltage and frequency scaling is a feature of the processor that
allows software to adapt the clock frequency and operating voltage of a
processor on the fly without requiring a reset [62]. DVFS enables
software to change system-on-chip (SoC) processing performance to
attain low-power consumption while meeting the performance
requirements. The main idea of the DVFS is dynamically scaling the
supply voltage of the CPU for a given frequency so that it operates at a
minimum speed required by the specific task executed [62]. Therefore,
this can yield a significant reduction in power consumption because of
the V 2 relationship shown in Eq. (2.2).
Reducing the operating frequency reduces the processor
performance and the power consumption per second. Also, when
reducing the voltage, the leakage current from the CPU’s transistors
decreases, making the processor most energy-efficient resulting in
further gains [99]. However, determining the ideal frequency and
voltage for a given point of execution is not a trivial task. To make the
DVFS management as transparent as possible to the software
developer, Operating Systems provide frameworks that allow each CPU
core to have a min/max frequency, and a governor to control it.
Governors are kernels models that can drive CPU core
frequency/voltage operating points. Currently, the most common
available governors are:
Performance: The frequency of the processor is always fixed at the
maximum, even if the processor is underutilized.
Powersave: The frequency of the processor is always fixed at the
minimum allowable frequency.
Userspace: allows the user or any userspace program running to set
the CPU for a specific frequency.
Ondemand: The frequency of the processor is adjusted according to
the workload behavior, within the range of allowed frequencies.
Conservative: In the same way as the previous mode, the frequency of
the processor is gradually adjusted based on the workload, but in a
more conservative way.
Besides the pre-defined governors, it is possible to set the processor
frequency level manually, by editing the configurations of the CPU
frequency driver.

2.2.2 Power Gating


Power gating consists of selectively powering down certain blocks in
the chip while keeping other blocks powered up. The goal of power
gating is to minimize leakage current by temporarily switching power
off to blocks that are not required in the current operating mode [59].
Power gating can be applied either at the unit-level, reducing the power
consumption of unused core functional units or at the core-level, in
which entire cores may be power gated [56, 76]. Currently, power
gating is mainly used in multicore processors to switch off unused
cores to reduce power consumption [84].
Power gating requires the presence of a header “sleep” transistor
that can set the supply voltage of the circuit to ground level during idle
times. Power gating also requires a control logic that decides when to
power gate the circuit. Every time that the power gating is applied, an
energy overhead cost occurs due to distributing the sleep signal to the
header transistor before the circuit is turned off, and turning off the
sleep signal and driving the voltage when the circuit is powered on
again. Therefore, there is a break-even point, which represents the
exact point in time where the cumulative leakage energy savings is
equal to the energy overhead incurred by power gating. If, after the
decision to power gate a unit, the unit stays idle for a time interval that
is longer than the break-even point, then power gating saves energy. On
the other hand, if the unit needs to be active again before the break-
even point is reached, then power gating incurs an energy penalty [75].

References
16. Buono, D., Matteis, T.D., Mencagli, G., Vanneschi, M.: Optimizing message-passing
on multicore architectures using hardware multi-threading. In: 2014 22nd
Euromicro International Conference on Parallel, Distributed, and Network-
Based Processing, pp. 262–270. ACM, New York (2014). https://​doi.​org/​10.​
1109/​P DP.​2014.​63
17.
Butenhof, D.R.: Programming with POSIX Threads. Addison-Wesley Longman
Publishing, Boston (1997)
21.
Chandramowlishwaran, A., Knobe, K., Vuduc, R.: Performance evaluation of
concurrent collections on high-performance multicore computing systems. In:
2010 IEEE International Symposium on Parallel Distributed Processing
(IPDPS), pp. 1–12. IEEE, Piscataway (2010). https://​doi.​org/​10.​1109/​I PDPS.​
2010.​5470404
22.
Chapman, B., Jost, G., Pas, R.v.d.: Using OpenMP: Portable Shared Memory
Parallel Programming (Scientific and Engineering Computation). MIT Press,
Cambridge, MA (2007)
34.
Dutot, P.F., Georgiou, Y., Glesser, D., Lefevre, L., Poquet, M., Rais, I.: Towards
energy budget control in HPC. In: IEEE/ACM International Symposium on
Cluster, Cloud and Grid Computing, pp. 381–390. IEEE, Piscataway (2017)
36.
Foster, I.: Designing and Building Parallel Programs: Concepts and Tools for
Parallel Software Engineering. Addison-Wesley Longman Publishing, Boston
(1995)
[zbMATH]
38.
Gropp, W., Lusk, E., Skjellum, A.: Using MPI (2Nd Ed.): Portable Parallel
Programming with the Message-passing Interface. MIT Press, Cambridge (1999)
[Crossref]
43.
Hennessy, J.L., Patterson, D.A.: Computer Architecture: A Quantitative Approach,
3rd edn. Morgan Kaufmann Publishers, San Francisco (2003)
[zbMATH]
56.
Kahng, A.B., Kang, S., Rosing, T.S., Strong, R.: Many-core token-based adaptive
power gating. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 32(8), 1288–
1292 (2013). https://​doi.​org/​10.​1109/​TCAD.​2013.​2257923
[Crossref]
58.
Kaxiras, S., Martonosi, M.: Computer Architecture Techniques for Power-
Efficiency, 1st edn. Morgan and Claypool Publishers (2008)
59.
Keating, M., Flynn, D., Aitken, R., Gibbons, A., Shi, K.: Low Power Methodology
Manual: For System-on-Chip Design. Springer, Berlin (2007)
61.
Korthikanti, V.A., Agha, G.: Towards optimizing energy costs of algorithms for
shared memory architectures. In: Proceedings of the 22nd Annual ACM
Symposium on Parallelism in Algorithms and Architectures (SPAA 2010) Thira,
Santorini, Greece, June 13–15, 2010, pp. 157–165 (2010). https://​doi.​org/​10.​
1145/​1810479.​1810510
62.
Le Sueur, E., Heiser, G.: Dynamic voltage and frequency scaling: the laws of
diminishing returns. In: Proceedings of the 2010 International Conference on
Power Aware Computing and Systems, HotPower’10, pp. 1–8. USENIX
Association, Berkeley (2010)
75.
Lungu, A., Bose, P., Buyuktosunoglu, A., Sorin, D.J.: Dynamic power gating with
quality guarantees. In: Proceedings of the 2009 ACM/IEEE International
Symposium on Low Power Electronics and Design, ISLPED ’09, pp. 377–382.
ACM, New York (2009). https://​doi.​org/​10.​1145/​1594233.​1594331
76.
Madan, N., Buyuktosunoglu, A., Bose, P., Annavaram, M.: A case for guarded
power gating for multi-core processors. In: 2011 IEEE 17th International
Symposium on High Performance Computer Architecture, pp. 291–300 (2011).
https://​doi.​org/​10.​1109/​H PCA.​2011.​5749737
79.
McCool, M., Reinders, J., Robison, A.: Structured Parallel Programming: Patterns
for Efficient Computation, 1st edn. Morgan Kaufmann Publishers, San Francisco
(2012)
84.
Oboril, F., Tahoori, M.B.: Extratime: Modeling and analysis of wearout due to
transistor aging at microarchitecture-level. In: IEEE/IFIP International
Conference on Dependable Systems and Networks (DSN 2012), pp. 1–12 (2012).
https://​doi.​org/​10.​1109/​DSN.​2012.​6263957
86.
OpenMP, A.: Openmp 4.0: specification (2013)
88.
Patterson, D.A., Hennessy, J.L.: Computer Organization and Design, 5th edn. The
Hardware/Software Interface, 5th edn. Morgan Kaufmann Publishers, San
Francisco (2013)
97.
Rauber, T., Rü nger, G.: Parallel Programming: For Multicore and Cluster Systems,
2nd edn. Springer, Berlin (2013)
[Crossref]
99.
Rossi, F.D., Storch, M., de Oliveira, I., Rose, C.A.F.D.: Modeling power consumption
for dvfs policies. In: 2015 IEEE International Symposium on Circuits and
Systems (ISCAS), pp. 1879–1882. IEEE, Piscataway (2015). https://​doi.​org/​10.​
1109/​I SCAS.​2015.​7169024
117.
Tanenbaum, A.S.: Modern Operating Systems, 3rd edn. Prentice Hall, Upper
Saddle River (2007)
[zbMATH]
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2019
A. Francisco Lorenzon, A. C. S. Beck Filho, Parallel Computing Hits the Power Wall,
SpringerBriefs in Computer Science
https://doi.org/10.1007/978-3-030-28719-1_3

3. The Impact of Parallel Programming


Interfaces on Energy
Arthur Francisco Lorenzon1 and Antonio Carlos Schneider Beck
Filho2

(1) Department of Computer Science, Federal University of Pampa


(UNIPAMPA), Alegrete, Rio Grande do Sul, Brazil
(2) Institute of Informatics, Campus do Vale, Federal University of Rio
Grande do Sul (UFRGS), Porto Alegre, Rio Grande do Sul, Brazil

3.1 Methodology
3.1.1 Benchmarks
In order to study the characteristics of each PPI regarding the
thread/process management and synchronization/communication,
fourteen parallel benchmarks were implemented and parallelized in C
language and classified into two classes: high and low communication
(HC and LC). For that, we considered the amount of communication
(i.e., data exchange), the synchronization operations needed to ensure
data transfer correctness (mutex, barriers), and operations to
create/finalize threads/processes.
Table 3.1 quantifies the communication rate for each benchmark (it
also shows their input sizes), considering 2, 3, 4, and 8
threads/processes, obtained by using the Intel Pin Tool [74]. HC
programs have several data dependencies that must be addressed at
runtime to ensure correctness of the results. Consequently, they
demand large amounts of communication among threads/processes, as
it is shown in Fig. 3.1a. On the other hand, LC programs present little
communication among threads/processes, because they are needed
only to distribute the workload and to join the final result (as it is
shown in Fig. 3.1b).

Table 3.1 Main characteristics of the benchmarks

Operations to exchange data


(Total per no. of threads/processes)
Benchmarks 2 3 4 8 Input size
HC Game of life 414 621 1079 1625 4096 ×
4096
Gauss–Seidel 20,004 20,006 20,008 20,016 2048 ×
2048
Gram–Schmidt 3,009,277 4,604,284 6,385,952 12,472,634 2048 ×
2048
Jacobi 4004 6006 8008 16,016 2048 ×
2048
Odd–even sort 300,004 450,006 600,008 1,200,016 150,000
Turing ring 16,000 24,000 32,000 64,000 2048 ×
2048
LC Calc. of the PI 4 6 8 16 4 billions
number
DFT 4 6 8 16 32,368
Dijkstra 4 6 8 16 2048 ×
2048
Dot-product 4 6 8 16 15 billions
Harmonic series 8 12 16 32 100,000
Integral-quadrature 4 6 8 16 1 billion
Matrix 4 6 8 16 2048 ×
multiplication 2048
Similarity of 4 6 8 16 1920 ×
histograms 1080
Fig. 3.1 Behavior of benchmarks. (a) High communication. (b) Low communication
Since the way a parallel application is written may influence its
behavior during execution, we have followed the guidelines indicated
by [17, 36, 38] and [22]. The OpenMP implementations were
parallelized using parallel loops, splitting the number of loops
iterations (for) among threads. As discussed in [22], this approach is
ideal for applications that compute on uni- and bi-dimensional
structures, which is the case. Loop parallelism can be exploited by using
different scheduling types that distribute the iterations to threads
(static, guided, and dynamic) with different granularities (number of
iterations assigned to each thread as the threads request them). As
demonstrated in [69], the static scheduler with coarse granularity
presents the best results for the same benchmark set used in this study
and, therefore, this scheduling mechanism is used here.
As indicated by [17, 36] and [38], we have used parallel tasks for the
PThreads and MPI implementations. In such cases, the iterations of the
loop were distributed based on the best workload balancing between
threads/processes. Moreover, the communication between MPI
processes was implemented by using nonblocking operations, to
provide better performance, as showed in [44].

3.1.2 Multicore Architectures


3.1.2.1 General-Purpose Processors
Core2Quad
The Intel Core2Quad is an implementation of the ×86-64 ISA. In this
study, the 45 nm Core2Quad Q8400 was used, which has 4 CPU cores
running at 2.66 GHz, and a TDP of 95 W. It uses the Intel Core
microarchitecture targeted mainly to desktop and server domains. It is
a highly complex superscalar processor, which uses several techniques
to improve ILP: memory disambiguation; speculative execution with
advanced prefetchers; and a smart cache mechanism that provides
flexible performance for both single and multithreaded applications.1
As Fig. 3.2a shows, the memory system is organized as follows: each
core has a private 32 kB instruction and 32 kB data L1 caches. There
are two L2 caches of 2 MB (4 MB in total), each of them shared between
clusters of two cores. The platform has 4 GB of main memory, which is
the only memory region accessible by all the cores.

Fig. 3.2 Memory organization of each processor used in this study. (a) Intel
Core2Quad and Xeon. (b) Intel Atom. (c) ARM Cortex-A9/A8

Xeon The Intel Xeon is also an ×86-64 processor. The version used in
this work is a 45 nm dual processor Xeon E5405. Each processor has 4
CPU cores (so there are 8 cores in total), running at 2.0 GHz, with a TDP
of 80 W. It also uses the Core microarchitecture; however, unlike
Core2Quad, Xeon processor E5 family is designed for industry-leading
performance and maximum energy efficiency, since it is widely
employed in HPC systems. The memory organization is similar to the
Core2Quad (Fig. 3.2a): each core has a private 32 kB instruction and 32
kB data L1 caches. There are two L2 caches of 6 MB (12 MB in total),
each of them shared between clusters of two cores. The platform has 8
GB of RAM, which is the only memory region accessible by all the cores.
3.1.2.2 Embedded Processors
Atom The Intel Atom is also an ×86-64 processor, but targeted to
embedded systems. In this study, the 32 nm Atom N2600 was used,
which has 2 CPU cores (4 threads by using Hyper-Threading support)
running at 1.6 GHz, a TDP of 3.5 W. It uses the Saltwell
microarchitecture, designed for portable devices with low-power
consumption. Since the main characteristic of ×86 processors is the
backward compatibility with the ×86 instructions set, programs
already compiled for these processors will run without changes on
Atom.2 The memory system is organized as illustrated in Fig. 3.2b: each
core has 32 kB instruction and 24 kB data L1 caches, and a private 512
kB L2 cache. The platform has 2 GB of RAM, which is the memory
shared by all the cores.

ARM We consider the Cortex-A9 processor. ARM is the world’s


leading in the market of embedded processors. Designed around a dual-
issue out-of-order superscalar, the Cortex-A family is optimized for low-
power and high-performance applications.3 The 40 nm ARM Cortex-A9
is a 32-bit processor, which implements the ARMv7 architecture with 4
CPU cores running at 1.2 GHz and TDP of 2.5 W. The memory system is
organized as illustrated in Fig. 3.2c: each core has a private 32 kB
instruction and 32 kB data L1 caches. The L2 cache of 1 MB is shared
among all cores, and the platform has 1 GB of RAM. Since the ISA and
microarchitecture of the Cortex-A8 and Cortex-A9 are similar, we also
investigate the behavior of A8 based on the results obtained in the A9.
The version considered is a 65 nm Cortex-A8 which has an operating
frequency of 1 GHz, a TDP of 1.8 W.

3.1.3 Execution Environment


The Performance Application Programming Interface (PAPI) [14] was
used to evaluate the behavior of processor and memory system without
the influence of the operating system (i.e., function calls, interruptions,
etc.). By inserting functions in the code, PAPI allows the developer to
obtain the data directly from the hardware counters present in modern
processors. With these hardware counters, it is possible to gather the
number of completed instructions, memory accesses
(data/instructions), and the number of executed cycles to calculate
performance and energy consumption.
The energy consumption was calculated using the data provided by
the authors in [13] (for the processors) and Cacti Tool (for the memory
systems), as shown in Table 3.2. To estimate the total energy
consumption (Et), we have taken into account the energy consumed for
the executed instructions (E inst), cache and main memory accesses (E
mem), and static energy (E static), as given by Eq. (3.1).

(3.1)

Table 3.2 Energy consumption for each component on each processor

ARM Intel
Cortex- Cortex- Atom Core2Quad Xeon
A8 A9
Processor—static 0.17 W 0.25 W 0.484 W 4.39 W 3.696 W
power
L1-D static power 0.0005 W 0.0005 W 0.00026 0.0027 W 0.0027
W W
L1-I static power 0.0005 W 0.0005 W 0.00032 0.0027 W 0.0027
W W
L2—static power 0.0258 W 0.0258 W 0.0096 W 0.0912 W 0.1758
W
RAM—static power 0.12 W 0.12 W 0.149 W 0.36 W 0.72 W
Energy per instruction 0.266 nJ 0.237 nJ 0.391 nJ 0.795 nJ 0.774 nJ
L1-D—energy/access 0.017 nJ 0.017 nJ 0.013 nJ 0.176 nJ 0.176 nJ
L1-I—energy/access 0.017 nJ 0.017 nJ 0.015 nJ 0.176 nJ 0.176 nJ
L2—energy/access 0.296 nJ 0.296 nJ 0.117 nJ 1.870 nJ 3.093 nJ
RAM—energy/access 2.77 nJ 2.77 nJ 3.94 nJ 15.6 nJ 24.6 nJ

To find the energy consumed by the instructions, Eq. (3.2) was used,
where I exe is the number of executed instructions multiplied by the
average energy spent by each one of them (E perinst).
(3.2)
The energy consumption for the memory system was obtained with
Eq. (3.3), where (L1DC acc × E L1DC) is the energy spent by accessing the
L1 data cache memory; (L1IC acc × E L1IC) is the same, but for the L1
instruction cache; (L2acc × E L2) is for the L2 cache; and (L2miss × E main)
is the energy spent by the main memory accesses.

(3.3)

The static consumption of all components is given by Eq. (3.4). As


static power is consumed while the circuit is powered, it must be
considered during all execution time: (#Cycles) of application divided
by the operating frequency (Freq). We have considered the static
consumption of the processor (S CPU), L1 data (S L1DC) and instruction (S
L1IC) caches, L2 cache (S L2), and main memory (S MAIN).

(3.4)

3.1.4 Setup
The results presented in the next section consider an average of ten
executions, with a standard deviation of less than 1% for each
benchmark. Their input sizes are described in Table 3.1. The programs
were split into 2, 3, 4, and 8 threads/processes. Although most of the
processors used in this work support only four threads, and are not
commercially available in an 8-core configuration, it is possible to
approximate the results by using the following approach: as an
example, let us consider that we have two threads executing on one
core only. These threads have synchronization points and when one
thread gets there, it must wait for the other one and so on as long as
there still are synchronization points. What it is done is to gather data
of each thread executing on the core in between two synchronization
points (which involves number of instructions, memory access,
Random documents with unrelated
content Scribd suggests to you:
one overlapping the other so that the whole made a glittering
basque. She took quick sure stitches that jerked the fantastic
garment in her lap, and when she did this the sun caught the
brilliant heap aslant and turned it into a blaze of gold and orange
and ice-blue and silver.
Kim was enchanted. Her mother was a fairy princess. It was
nothing to her that the spangle-covered basque, modestly eked out
with tulle and worn with astonishingly long skirts for a bareback
rider, was to serve as Magnolia’s costume in The Circus Clown’s
Daughter.
Kim’s grandmother had scolded a good deal about that costume.
But then, she had scolded a good deal about everything. It was
years before Kim realized that all grandmothers were not like that.
At three she thought that scolding and grandmothers went together,
like sulphur and molasses. The same was true of fun and
grandfathers, only they went together like ice cream and cake. You
called your grandmother grandma. You called your grandfather Andy,
or, if you felt very roguish, Cap’n. When you called him that, he
cackled and squealed, which was his way of laughing, and he clawed
his delightful whiskers this side and that. Kim would laugh then, too,
and look at him knowingly from under her long lashes. She had large
eyes, deep-set like her mother’s and her mother’s wide mobile
mouth. For the rest, she was much like her father—a Ravenal, he
said. His fastidious ways (highfalutin, her grandmother called them);
his slim hands and feet; his somewhat drawling speech, indirect
though strangely melting glance, calculatedly impulsive and winning
manner.
Another childhood memory was that of a confused and terrible
morning. Asleep in her small bed in the room with her father and
mother, she had been wakened by a bump, followed by a lurch, a
scream, shouts, bells, clamour. Wrapped in her comforter, hastily
snatched up from her bed by her mother, she was carried to the
deck in her mother’s arms. Gray dawn. A misty morning with fog
hanging an impenetrable curtain over the river, the shore. The child
was sleepy, bewildered. It was all one to her—the confusion, the
shouting, the fog, the bells. Close in her mother’s arms, she did not
in the least understand what had happened when the confusion
became pandemonium; the shouts rose to screams. Her
grandfather’s high squeaky voice that had been heard above the din
—“La’berd lead there! Sta’berd lead! Snatch her! SNATCH HER!” was
heard no more. Something more had happened. Someone was in
the water, hidden by the fog, whirled in the swift treacherous
current. Kim was thrown on her bed like a bundle of rags, all rolled
in her blanket. She was left there, alone. She had cried a little, from
fright and bewilderment, but had soon fallen asleep again. When she
woke up her mother was bending over her, so wild-eyed, so
frightening with her black hair streaming about her face and her face
swollen and mottled with weeping, that Kim began to cry again in
sheer terror. Her mother had snatched her to her. Curiously enough
the words Magnolia Ravenal now whispered in a ghastly kind of
agony were the very words she had whispered after the agony of
Kim’s birth—though the child could not know that.
“The river!” Magnolia said, over and over. Gaylord Ravenal came
to her, flung an arm about her shoulder, but she shook him off wildly.
“The river! The river!”
Kim never saw her grandfather again. Because of the look it
brought to her mother’s face, she soon learned not to say, “Where’s
Andy?” or—the roguish question that had always made him appear,
squealing with delight: “Where’s Cap’n?”
Baby though she was, the years—three or four—just preceding
her grandfather’s tragic death were indelibly stamped on the infant’s
mind. He had adored her; made much of her. Andy, dead, was
actually a more vital figure than many another alive.
It had been a startling but nevertheless actual fact that Parthenia
Ann Hawks had not wanted her daughter Magnolia to have a child.
Parthy’s strange psychology had entered into this, of course—a
pathological twist. Of this she was quite unaware.
“How’re you going to play ingénue lead, I’d like to know, if you—
when you—while you——” She simply could not utter the word
“pregnant” or say, “while you are carrying your child,” or even the
simpering evasion of her type and class—“in the family way.”
Magnolia laughed a little at that. “I’ll play as long as I can.
Toward the end I’ll play ruffly parts. Then some night, probably
between the second and third acts—though they may have to hold
the curtain for five minutes or so—I’ll excuse myself——”
Mrs. Hawks declared that she had never heard anything so
indelicate in her life. “Besides, a show boat’s no place to bring up a
child.”
“You brought me up on one.”
“Yes,” said Mrs. Hawks, grimly. Her tone added, “And now look at
you!”
Even before Kim’s birth the antagonism between Parthy and her
son-in-law deepened to actual hatred. She treated him like a
criminal; regarded Magnolia’s quite normal condition as a reproach
to him.
“Look here, Magnolia, I can’t stand this, you know. I’m so sick of
this old mud-scow and everything that goes with it.”
“Gay! Everything!”
“You know what I mean. Let’s get out of it. I’m no actor. I don’t
belong here. If I hadn’t happened to see you when you stepped out
on deck that day at New Orleans——”
“Are you sorry?”
“Darling! It’s the only luck I’ve ever had that lasted.”
She looked thoughtfully down at the clear colourful brilliance of
the diamond on her third finger. Always too large for her, it now
hung so loosely on her thin hand that she had been obliged to wind
it with a great pad of thread to keep it from dropping off, though
hers were the large-knuckled fingers of the generous and
resourceful nature. It was to see much of life, that ring.
She longed to say to him, “Where do you belong, Gay? Who are
you? Don’t tell me you’re a Ravenal. That isn’t a profession, is it?
You can’t live on that.”
But she knew it was useless. There was a strange deep streak of
the secretive in him; baffling, mystifying. Questioned, he would say
nothing. It was not a moody silence, or a resentful one. He simply
would not speak. She had learned not to ask.
“We can’t go away now, Gay dear. I can’t go. You don’t want to
go without me, do you? You wouldn’t leave me! Maybe next winter,
after the boat’s put up, we can go to St. Louis, or even New Orleans
—that would be nice, wouldn’t it? The winter in New Orleans.”
One of his silences.
He never had any money—that is, he never had it for long. It
vanished. He would have one hundred dollars. He would go ashore
at some sizable town and return with five hundred—a thousand.
“Got into a little game with some of the boys,” he would explain,
cheerfully. And give her three hundred of it, four hundred, five. “Buy
yourself a dress, Nola. Something rich, with a hat to match. You’re
too pretty to wear those homemade things you’re always messing
with.”
Some woman wisdom in her told her to put by a portion of these
sums. She got into the habit of tucking away ten dollars, twenty,
fifty. At times she reproached herself for this; called it disloyal,
sneaking, underhand. When she heard him say, as he frequently did,
“I’m strapped. If I had fifty dollars I could turn a trick that would
make five hundred out of it. You haven’t got fifty, have you, Nola?
No, of course not.”
She wanted then to give him every cent of her tiny hoard. It was
the tenuous strain of her mother in her, doubtless—the pale thread
of the Parthy in her make-up—that caused her to listen to an inner
voice. “Don’t do it,” whispered the voice, nudging her, “keep it. You’ll
need it badly by and by.”
It did not take many months for her to discover that her husband
was a gambler by profession—one of those smooth and plausible
gentry with whom years of river life had made her familiar. It was,
after all, not so much a discovery as a forced admission. She knew,
but refused to admit that she knew. Certainly no one could have
been long in ignorance with Mrs. Hawks in possession of the facts.
Ten days after Magnolia’s marriage to Ravenal (and what a ten
days those had been! Parthy alone crowded into them a lifetime of
reproach), Mrs. Hawks came to her husband, triumph in her mien,
portent in her voice:
“Well, Hawks, I hope you’re satisfied now.” This was another of
Parthy’s favourite locutions. The implication was that the unfortunate
whom she addressed had howled heaven-high his demands for
hideous misfortune and would not be content until horror had piled
upon horror. “I hope you’re satisfied now, Hawks. Your son-in-law is
a gambler, and no more. A common barroom gambler, without a
cent to his trousers longer’n it takes to transfer his money from his
pocket to the table. That’s what your only daughter has married.
Understand, I’m not saying he gambles, and that’s all. I say he’s a
gambler by calling. That’s the way he made his living before he
came aboard this boat. I wish he had died before he ever set foot on
the Cotton Blossom gangplank, and so I tell you, Hawks. A smooth-
tongued, oily, good-for-nothing; no better than the scum Elly ran off
with.”
“Now, Parthy, what’s done’s done. Why’n’t you try to make the
best of things once in a while, instead of the worst? Magnolia’s
happy with him.”
“She ain’t lived her life out with him yet. Mark my words. He’s got
a roving eye for a petticoat.”
“Funny thing, Parthy. Your father was a man, and so’s your
husband, and your son-in-law’s another. Yet seems you never did get
the hang of a man’s ways.”
Andy liked Ravenal. There was about the fellow a grace, an ease,
a certain elegance that appealed to the æsthetic in the little Gallic
captain. When the two men talked together sometimes, after dinner,
it was amiably, in low tones, with an air of leisure and relaxation.
Two gentlemen enjoying each other’s company. There existed
between the two a sound respect and liking.
Certainly Ravenal’s vogue on the rivers was tremendous. Andy
paid him as juvenile lead a salary that was unheard of in show-boat
records. But he accounted him worth it. Shortly after Kim’s birth,
Andy spoke of giving Ravenal a share in the Cotton Blossom. But this
Mrs. Hawks fought with such actual ferocity that Andy temporarily at
least relinquished the idea.
Magnolia had learned to dread the idle winter months. During
this annual period of the Cotton Blossom’s hibernation the Hawks
family had, before Magnolia’s marriage, gone back to the house near
the river at Thebes. Sometimes Andy had urged Parthy to spend
these winter months in the South, evading the harsh Illinois climate
for a part of the time at least in New Orleans, or one of the towns of
southern Mississippi where one might have roses instead of holly for
Christmas. He sometimes envied black Jo and Queenie their period
of absence from the boat. In spite of the disreputable state in which
they annually returned to the Cotton Blossom in the early spring,
they always looked as if they had spent the intervening months
seated in the dappled shade, under a vine, with the drone of insects
in the air, and the heavy scent of white-petalled blossoms; eating
fruit that dripped juice between their fingers; sleeping, slack-jawed
and heavily content, through the heat of the Southern mid-
afternoon; supping greasily and plentifully on fried catfish and corn
bread; watching the moon come up to the accompaniment of Jo’s
coaxing banjo.
“We ought to lazy around more, winters,” Andy said to his
energetic wife. She was, perhaps, setting the Thebes house to rights
after their long absence; thwacking pillows, pounding carpets,
sloshing pails, scouring tables, hanging fresh curtains, flapping
drapes, banging bureau drawers. A towel wrapped about her head,
turban-wise, her skirts well pinned up, she would throw a frenzy of
energy into her already exaggerated housewifeliness until Andy,
stepping fearfully out of the way of mop and broom and pail, would
seek waterfront cronies for solace.
“Lazy! I’ve enough of lazying on that boat of yours month in
month out all summer long. No South for me, thank you. Eight
months of flies and niggers and dirty mud-tracking loafers is enough
for me, Captain Hawks. I’m thankful to get back for a few weeks
where I can live like a decent white woman.” Thwack! Thump! Bang!
After one trial lasting but a few days, the Thebes house was
found by Magnolia to be impossible for Gaylord Ravenal. That first
winter after their marriage they spent in various towns and cities.
Memphis for a short time; a rather hurried departure; St. Louis;
Chicago. That brief glimpse of Chicago terrified her, but she would
not admit it. After all, she told herself, as the astounding roar and
din and jangle and clatter of State Street and Wabash Avenue beat
at her ears, this city was only an urban Mississippi. The cobblestones
were the river bed. The high grim buildings the river banks. The
men, women, horses, trucks, drays, carriages, street cars that
surged through those streets; creating new channels where some
obstacle blocked their progress; felling whole sections of stone and
brick and wood and sweeping over that section, obliterating all trace
of its former existence; lifting other huge blocks and sweeping them
bodily downstream to deposit them in a new spot; making a
boulevard out of what had been a mud swamp—all this, Magnolia
thought, was only the Mississippi in another form and environment;
ruthless, relentless, Gargantuan, terrible. One might think to know
its currents and channels ever so well, but once caught unprepared
in the maelstrom, one would be sucked down and devoured as
Captain Andy Hawks had been in that other turbid hungry flood.
“You’ll get used to it,” Ravenal told his bride, a trifle patronizingly,
as one who had this monster tamed and fawning. “Don’t be
frightened. It’s mostly noise.”
“I’m not frightened, really. It’s just the kind of noise that I’m not
used to. The rivers, you know, all these years—so quiet. At night and
in the morning.”
That winter she lived the life of a gambler’s wife. Streak o’ lean,
streak o’ fat. Turtle soup and terrapin at the Palmer House to-day.
Ham and eggs in some obscure eating house to-morrow. They rose
at noon. They never retired until the morning hours. Gay seemed to
know a great many people, but to his wife he presented few of
these.
“Business acquaintance,” he would say. “You wouldn’t care for
him.”
Hers had been a fantastic enough life on the show boat. But
always there had been about it an orderliness, a routine, due,
perhaps, to the presence of the martinet, Parthenia Ann Hawks.
Indolent as the days appeared on the rivers, they still bore a
methodical aspect. Breakfast at nine. Rehearsal. Parade. Dinner at
four. Make-up. Curtain. Wardrobe to mend or refurbish; parts to
study; new songs to learn for the concert. But this new existence
seemed to have no plot or plan. Ravenal was a being for the most
part unlike the lover and husband of Cotton Blossom days.
Expansive and secretive by turn; now high-spirited, now depressed;
frequently absent-minded. His manner toward her was always
tender, courteous, thoughtful. He loved her as deeply as he was
capable of loving. She knew that. She had to tell herself all this one
evening when she sat in their hotel room, dressed and waiting for
him to take her to dinner and to the theatre. They were going to
McVicker’s Theatre, the handsome new auditorium that had risen out
of the ashes of the old (to quote the owner’s florid announcement).
Ravenal was startled to learn how little Magnolia knew of the great
names of the stage. He had told her something of the history of
McVicker’s, in an expansive burst of pride in Chicago. He seemed to
have a definite feeling about this great uncouth giant of a city.
“When you go to McVicker’s,” Ravenal said, “you are in the
theatre where Booth has played, and Sothern, and Lotta, and Kean,
and Mrs. Siddons.”
“Who,” asked Magnolia, “are they?”
He was so much in love that he found this ignorance of her own
calling actually delightful. He laughed, of course, but kissed her
when she pouted a little, and explained to her what these names
meant, investing them with all the glamour and romance that the
theatre—the theatre of sophistication, that is—had for him; for he
had the gambler’s love of the play. It must have been something of
that which had held him so long to the Cotton Blossom. Perhaps,
after all, his infatuation for Magnolia alone could not have done it.
And now she was going to McVicker’s. And she had on her dress
with the open-throated basque, which she considered rather daring,
though now that she was a married woman it was all right. She was
dressed long before the time when she might expect him back. She
had put out fresh linen for him. He was most fastidious about his
dress. Accustomed to the sloppy deshabille of the show boat’s male
troupers, this sartorial niceness in Ravenal had impressed her from
the first.
She regarded herself in the mirror now. She knew she was not
beautiful. She affected, in fact, to despise her looks; bemoaned her
high forehead and prominent cheek-bones, her large-knuckled
fingers, her slenderness, her wide mouth. Yet she did not quite
believe these things she said about herself; loved to hear Ravenal
say she was beautiful. As she looked at her reflection now in the
long gilt-framed mirror of the heavy sombre walnut bedroom, she
found herself secretly agreeing with him. This was the first year of
her marriage. She was pregnant. It was December. The child was
expected in April. There was nothing distorted about her figure or
her face. As is infrequently the case, her condition had given her an
almost uncanny radiance of aspect. Her usually pallid skin showed a
delicious glow of rosy colouring; her eyes were enormous and
strangely luminous; tiny blue veins were faintly, exquisitely etched
against the cream tint of her temples; her rather angular slimness
was replaced by a delicate roundness; she bore herself well, her
shoulders back, her head high. A happy woman, beloved, and in
love.
Six o’clock. A little late, but he would be here at any moment
now. Half-past six. She was opening the door every five minutes to
peer up the red-carpeted corridor. Seven. Impatience had given way
to fear, fear to terror, terror to certain agony. He was dead. He had
been killed. She knew by now that he frequented the well-known
resorts of the city, that he played cards in them. “Just for pastime,”
he told her. “Game of cards to while away the afternoon. What’s the
harm in that? Now, Nola! Don’t look like your mother. Please!”
She knew about them. Red plush and gilt, mahogany and
mirrors. Food and drink. River-front saloons and river-front life had
long ago taught her not to be squeamish. She was not a foolish
woman, nor an intolerant. She was, in fact, in many ways wise
beyond her years. But this was 1888. The papers had been full of
the shooting of Simeon Peake, the gambler, in Jeff Hankins’ place
over on Clark Street. The bullet had been meant for someone else—
a well-known newspaper publisher, in fact. But a woman, hysterical,
crazed, revengeful, had fired it. It had gone astray. Ravenal had
known Simeon Peake. The shooting had been a shock to him. It had,
indeed, thrown him so much off his guard that he had talked to
Magnolia about it for relief. Peake had had a young daughter Selina.
She was left practically penniless.
Now the memory of this affair came rushing back to her. She was
frantic. Half-past seven. It was too late, now, for the dinner they had
planned for the gala evening—dinner at the Wellington Hotel, down
in the white marble café. The Wellington was just across the street
from McVicker’s. It would make everything simple and easy; no rush,
no hurrying over that last delightful sweet sip of coffee.
Eight o’clock. He had been killed. She no longer merely opened
the door to peer into the corridor. She left the room door open and
paced from room to hall, from hall to room, wildly; down the
corridor. Finally, in desperation, down to the hotel lobby into which
she had never stepped in the evening without her husband. There
were two clerks at the office desk. One was an ancient man, flabby
and wattled, as much a part of the hotel as the stones that paved
the lobby. He had soft wisps of sparse white hair that seemed to
float just above his head instead of being attached to it; and little
tufts of beard, like bits of cotton stuck on his cheeks. He looked like
an old baby. The other was a glittering young man; his hair glittered,
his eyes, his teeth, his nails, his shirt-front, his cuffs. Both these men
knew Ravenal; had greeted him on their arrival; had bowed
impressively to her. The young man had looked flattering things; the
old man had pursed his soft withered lips.
Magnolia glanced from one to the other. There were people at
the clerks’ desk, leaning against the marble slab. She waited,
nervous, uncertain. She would speak to the old man. She did not
want, somehow, to appeal to the glittering one. But he saw her,
smiled, left the man to whom he was talking, came toward her.
Quickly she touched the sleeve of the old man—leaned forward over
the marble to do it—jerked his sleeve, really, so that he glanced up
at her testily.
“I—I want—may I speak to you?”
“A moment, madam. I shall be free in a moment.”
The sparkler leaned toward her. “What can I do for you, Mrs.
Ravenal?”
“I just wanted to speak to this gentleman——”
“But I can assist you, I’m sure, as well as——”
She glanced at him and he was a row of teeth, all white and
even, ready to bite. She shook her head miserably; glanced
appealingly at the old man. The sparkler’s eyebrows came up. He
gave the effect of stepping back, courteously, without actually doing
so. Now that the old clerk faced her, questioningly, she almost
regretted her choice.
She blushed, stammered; her voice was little more than a
whisper. “I . . . my husband . . . have been . . . he hasn’t returned
. . . worried . . . killed or . . . theatre . . .”
The old baby cupped one hand behind his ear. “What say?”
Her beautiful eyes, in their agony, begged the sparkler now to
forgive her for having been rude. She needed him. She could not
shout this. He stepped forward, but the teeth were hidden. After all,
a chief clerk is a chief clerk. Miraculously, he had heard the whisper.
“You say your husband——?”
She nodded. She was terribly afraid that she was going to cry.
She opened her eyes very wide and tried not to blink. If she so
much as moved her lids she knew the mist that was making
everything swim in a rainbow haze would crystallize into tears.
“He is terribly late. I—I’ve been so worried. We were going to the
—to McVicker’s—and dinner—and now it’s after seven——”
“After eight,” wheezed cotton whiskers, peering at the clock on
the wall.
“—after eight,” she echoed, wretchedly. There! She had winked.
Two great drops plumped themselves down on the silk bosom of her
bodice with the open-throated neck line. It seemed to her that she
heard them splash.
“H’m!” cackled the old man.
The glittering one leaned toward her. She was enveloped in a
waft of perfume. “Now, now, Mrs. Ravenal! There’s absolutely
nothing to worry about. Your husband has been delayed. That’s all.
Unavoidably delayed.”
She snatched at this. “Do you think—? Are you sure? But he
always is back by six, at the latest. Always. And we were going to
dinner—and Mc——”
“You brides!” smiled the young man. He actually patted her hand,
then. Just a touch. “Now you just have a bite of dinner, like a
sensible little woman.”
“Oh, I couldn’t eat a bite! I couldn’t!”
“A cup of tea. Let me send up a cup of tea.”
The old one made a sucking sound with tongue and teeth,
rubbed his chin, and proffered his suggestion in a voice that seemed
to Magnolia to echo and reëcho through the hotel lobby. “Why’n’t
you send a messenger around for him, madam?”
“Messenger? Around? Where?”
Sparkler made a little gesture—a tactful gesture. “Perhaps he’s
having a little game of—uh—cards; and you know how time flies.
I’ve done the same thing myself. Look up at the clock and first thing
you know it’s eight. Now if I were you, Mrs. Ravenal——”
She knew, then. There was something so sure about this young
man; and so pitying. And suddenly she, too, was sure. She recalled
in a flash that time when they were playing Paducah, and he had not
come. They had held the curtain until after eight. Ralph had
searched for him. He had been playing poker in a waterfront saloon.
Send around for him! Not she. The words of a popular sentimental
song of the day went through her mind, absurdly.
Father, dear father, come home with me now.
The clock in the steeple strikes one.

She drew herself up, now. The actress. She even managed a
smile, as even and sparkling and toothy as the sparkler’s own. “Of
course. I’m very silly. Thank you so . . . I’ll just have a bit of supper
in my room. . . .” She turned away with a little gracious bow. The
eyes very wide again.
“H’m!” The old man. Translated it meant, “Little idiot!”
She took off the dress with the two dark spots on the silk of the
basque. She put away his linen and his shiny shoes. She took up
some sewing. But the mist interfered with that. She threw herself on
the bed. An agony of tears. That was better. Ten o’clock. She fell
asleep, the gas lights burning. At a little before midnight he came in.
She awoke with a little cry. Queerly enough, the first thing she
noticed was that he had not his cane—the richly mottled malacca
stick that he always carried. She heard herself saying, ridiculously,
half awake, half asleep, “Where’s your cane?”
His surprise at this matter-of-fact reception made his expression
almost ludicrous. “Cane! Oh, that’s so. Why I left it. Must have left
it.”
In the years that followed she learned what the absence of the
malacca stick meant. It had come to be a symbol in every pawnshop
on Clark Street. Its appearance was bond for a sum a hundred times
its actual value. Gaylord Ravenal always paid his debts.
She finished undressing, in silence. Her face was red and
swollen. She looked young and helpless and almost ugly. He was
uncomfortable and self-reproachful. “I’m sorry, Nola. I was detained.
We’ll go to the theatre to-morrow night.”
She almost hated him then. Being, after all, a normal woman,
there followed a normal scene—tears, reproaches, accusations,
threats, pleadings, forgiveness. Then:
“Uh—Nola, will you let me take your ring—just for a day or two?”
“Ring?” But she knew.
“You’ll have it back. This is Wednesday. You’ll have it by Saturday.
I swear it.”
The clear white diamond had begun its travels with the malacca
stick.
He had spoken the truth when he said that he had been
unavoidably detained.
She had meant not to sleep. She had felt sure that she would not
sleep. But she was young and healthy and exhausted from emotion.
She slept. As she lay there by his side she thought, before she slept,
that life was very terrible—but fascinating. Even got from this a glow
of discovery. She felt old and experienced and married and tragic.
She thought of her mother. She was much, much older and more
married, she decided, than her mother ever had been.
They returned to Thebes in February. Magnolia longed to be near
her father. She even felt a pang of loneliness for her mother. The
little white cottage near the river, at Thebes, looked like a toy house.
Her bedroom was doll-size. The town was a miniature village, like a
child’s Christmas set. Her mother’s bonnet was a bit of grotesquerie.
Her father’s face was etched with lines that she did not remember
having seen there when she left. The home-cooked food, prepared
by Parthy’s expert hands, was delicious beyond belief. She was a
traveller returned from a far place.
Captain Andy had ordered a new boat. He talked of nothing else.
The old Cotton Blossom, bought from Pegram years before, was to
be discarded. The new boat was to be lighted by some newfangled
gas arrangement instead of the old kerosene lamps. Carbide or
some such thing Andy said it was. There were to be special
footlights, new scenery, improved dressing and sleeping rooms. She
was being built at the St. Louis shipyards.
“She’s a daisy!” squeaked Andy, capering. He had just returned
from a trip to the place of the Cotton Blossom’s imminent birth. Of
the two impending accouchements—that which was to bring forth a
grandchild and that which was to produce a new show boat—it was
difficult to say which caused him keenest anticipation. Perhaps,
secretly, it was the boat, much as he loved Magnolia. He was, first,
the river man; second, the showman; third, the father.
“Like to know what you want a new boat for!” Parthy scolded.
“Take all the money you’ve earned these years past with the old tub
and throw it away on a new one.”
“Old one ain’t good enough.”
“Good enough for the riff-raff we get on it.”
“Now, Parthy, you know’s well’s I do you couldn’t be shooed off
the rivers now you’ve got used to ’em. Any other way of living’d
seem stale to you.”
“I’m a woman loves her home and asks for nothing better.”
“Bet you wouldn’t stay ashore, permanent, if you had the
chance.”
He won the wager, though he had to die to do it.
The new Cotton Blossom and the new grandchild had a trial by
flood on their entrance into life. The Mississippi, savage mother that
she was, gave them both a baptism that threatened for a time to
make their entrance into and their exit from the world a
simultaneous act. But both, after some perilous hours, were piloted
to safety; the one by old Windy, who swore that this was his last
year on the rivers; the other by a fat midwife and a frightened young
doctor. Through storm and flood was heard the voice of Parthenia
Ann Hawks, the scold, berating Captain Hawks her husband, and
Magnolia Ravenal her daughter, as though they, and not the
elements, were responsible for the predicament in which they now
found themselves.
There followed four years of war and peace. The strife was
internal. It raged between Parthy and her son-in-law. The conflict of
the two was a chemical thing. Combustion followed inevitably upon
their meeting. The biting acid of Mrs. Hawks’ discernment cut
relentlessly through the outer layers of the young man’s charm and
grace and melting manner and revealed the alloy. Ravenal’s nature
recoiled at sight of a woman who employed none of the arts of her
sex and despised and penetrated those of the opposite sex. She had
no vanity, no coquetry, no reticences, no respect for the reticence of
others; treated compliment as insult, met flattery with contempt.
A hundred times during those four years he threatened to leave
the Cotton Blossom, yet he was held to his wife Magnolia and to the
child Kim by too many tender ties. His revolt usually took the form of
a gambling spree ashore during which he often lost every dollar he
had saved throughout weeks of enforced economy. There was no
opportunity to spend money legitimately in the straggling hamlets to
whose landings the Cotton Blossom was so often fastened. Then,
too, the easy indolence of the life was beginning to claim him—its
effortlessness, its freedom from responsibility. Perhaps a new part to
learn at the beginning of the season—that was all. River audiences
liked the old plays. Came to see them again and again. It was
Ravenal who always made the little speech in front of the curtain.
Wish to thank you one and all . . . always glad to come back to the
old . . . to-morrow night that thrilling comedy-drama entitled . . .
each and every one . . . concert after the show . . .
Never had the Cotton Blossom troupe so revelled in home-baked
cakes, pies, cookies; home-brewed wine; fruits of tree and vine. The
female population of the river towns from the Great Lakes to the
Gulf beheld in him the lover of their secret dreams and laid at his
feet burnt offerings and shewbread. Ravenal, it was said by the
Cotton Blossom troupe, could charm the gold out of their teeth.
Perhaps, with the passing of the years, he might have grown
quite content with this life. Sometimes the little captain, when the
two men were conversing quietly apart, dropped a word about the
future.
“When I’m gone—you and Magnolia—the boat’ll be yours, of
course.”
Ravenal would laugh. Little Captain Andy looked so very much
alive, his bright brown eyes glancing here and there, missing nothing
on land or shore, his brown paw scratching the whiskers that
showed so little of gray, his nimble legs scampering from texas to
gangplank, never still for more than a minute.
“No need to worry about that for another fifty years,” Ravenal
assured him.
The end had in it, perhaps, a touch of the ludicrous, as had
almost everything the little capering captain did. The Cotton
Blossom, headed upstream on the Mississippi, bound for St. Louis,
had struck a snag in Cahokia Bend, three miles from the city. It was
barely dawn, and a dense fog swathed the river. The old Cotton
Blossom probably would have sunk midstream. The new boat stood
the shock bravely. In the midst of the pandemonium that followed
the high shrill falsetto of the little captain’s voice could be heard
giving commands which he, most of all, knew he had no right to
give. The pilot only was to be obeyed under such conditions. The
crew understood this, as did the pilot. It was, in fact, a legend that
more than once in a crisis Captain Andy on the upper deck had
screamed his orders in a kind of dramatic frenzy of satisfaction,
interspersing these with picturesque and vivid oaths during which he
had capered and bounced his way right off the deck and into the
river, from which damp station he had continued to screech his
orders and profanities in cheerful unconcern until fished aboard
again. Exactly this happened. High above the clamour rose the voice
of Andy. His little figure whirled like that of a dervish. Up, down,
fore, aft—suddenly he was overboard unseen in the dimness, in the
fog, in the savage swift current of the Mississippi, wrapped in the
coils of the old yellow serpent, tighter, tighter, deeper, deeper, until
his struggles ceased. She had him at last.
“The river,” Magnolia had said, over and over, “The river. The
river.”
XII

“T
hebes?” echoed Parthenia Ann Hawks, widow. The stiff
crêpe of her weeds seemed to bristle. “I’ll do nothing of
the kind, miss! If you and that fine husband of yours think
to rid yourself of me that way——”
“But, Mama, we’re not trying to rid ourselves of you. How can
you think of such things! You’ve always said you hated the boat.
Always. And now that Papa—now that you needn’t stay with the
show any longer, I thought you’d want to go back to Thebes to live.”
“Indeed! And what’s to become of the Cotton Blossom, tell me
that, Maggie Hawks!”
“I don’t know,” confessed Magnolia, miserably. “I don’t—know.
That’s what I think we ought to talk about.” The Cotton Blossom,
after her tragic encounter with the hidden snag in the Mississippi,
was in for repairs. The damage to the show boat had been greater
than they had thought. The snag had, after all, inflicted a jagged
wound. So, too, had it torn and wounded something deep and
hidden in Magnolia’s soul. Suddenly she had a horror of the great
river whose treacherous secret fangs had struck so poisonously. The
sight of the yellow turbid flood sickened her; yet held her
hypnotized. Now she thought that she must run from it, with her
husband and her child, to safety. Now she knew that she never could
be content away from it. She wanted to flee. She longed to stay.
This, if ever, was her chance. But the river had Captain Andy.
Somewhere in its secret coils he lay. She could not leave him. On the
rivers the three great mysteries—Love and Birth and Death—had
been revealed to her. All that she had known of happiness and
tragedy and tranquillity and adventure and romance and fulfilment
was bound up in the rivers. Their willow-fringed banks framed her
world. The motley figures that went up and down upon them or that
dwelt on their shores were her people. She knew them; was of
them. The Mississippi had her as surely as it had little Andy Hawks.
“Well, we’re talking about it, ain’t we?” Mrs. Hawks now
demanded.
“I mean—the repairs are going to be quite expensive. She’ll be
laid up for a month or more, right in the season. Now’s the time to
decide whether we’re going to try to run her ourselves just as if
Papa were still——”
“I can see you’ve been talking things over pretty hard and fast
with Ravenal. Well, I’ll tell you what we’re going to do, miss. We’re
going to run her ourselves—leastways, I am.”
“But, Mama!”
“Your pa left no will. Hawks all over. I’ve as much say-so as you
have. More. I’m his widow. You won’t see me willing to throw away
the good-will of a business that it’s taken years to build up. The
boat’s insurance’ll take care of the repairs. Your pa’s life insurance is
paid up, and quite a decent sum—for him. I saw to that. You’ll get
your share, I’ll get mine. The boat goes on like it always has. No
Thebes for me. You’ll go on playing ingénue leads; Ravenal juvenile.
Kim——”
“No!” cried Magnolia much as Parthy had, years before. “Not
Kim.”
“Why not?”
There was about the Widow Hawks a terrifying and invincible
energy. Her black habiliments of woe billowed about her like the
sable wings of a destroying angel. With Captain Andy gone, she
would appoint herself commander of the Cotton Blossom Floating
Palace Theatre. Magnolia knew that. Who, knowing Parthy, could
imagine it otherwise? She would appoint herself commander of their
lives. Magnolia was no weakling. She was a woman of mettle. But no
mettle could withstand the sledge-hammer blows of Parthy Ann
Hawks’ iron.
It was impossible that such an arrangement could hold. From the
first Ravenal rejected it. But Magnolia’s pleadings for at least a trial
won him over, but grudgingly.
“It won’t work, Nola, I tell you. We’ll be at each other’s throats.
She’s got all kinds of plans. I can see them whirling around in her
eye.”
“But you will try to be patient, won’t you, Gay? For my sake and
Kim’s?”
But they had not been out a week before mutiny struck the
Cotton Blossom. The first to go was Windy. Once his great feet were
set toward the gangplank there was no stopping him. He was over
seventy now, but he looked not an hour older than when he had
come aboard the Cotton Blossom almost fifteen years before. To the
irate widow he spoke briefly but with finality.
“You’re Hawks’ widow. That’s why I said I’d take her same’s if
Andy was alive. I thought Nollie’s husband would boss this boat, but
seems you’re running it. Well, ma’am, I ain’t no petticoat-pilot. I’m
off the end of this trip down. Young Tanner’ll come aboard there and
pilot you.”
“Tanner! Who’s he? How d’you know I want him? I’m running this
boat.”
“You better take him, Mrs. Hawks, ma’am. He’s young, and not
set in his ways, and likely won’t mind your nagging. I’m too old. Lost
my taste for the rivers, anyway, since Cap went. Lost my nerve, too,
seems like. . . . Well, ma’am, I’m going.”
And he went.
Changes came then, tripping on each other’s heels. Mis’ Means
stayed, and little weak-chested Mr. Means. Frank had gone after
Magnolia’s marriage. Ralph left.
Parthy met these difficulties and defeats with magnificent
generalship. She seemed actually to thrive on them. Do this. Do
that. Ravenal’s right eyebrow was cocked in a perpetual circumflex
of disdain. One could feel the impact of opposition whenever the two
came together. Every fibre of Ravenal’s silent secretive nature was
taut in rejection of this managerial mother-in-law. Every nerve and
muscle of that energetic female’s frame tingled with enmity toward
this suave soft-spoken contemptuous husband of her daughter.
Finally, “Choose,” said Gaylord Ravenal, “between your mother
and me.”
Magnolia chose. Her decision met with such terrific opposition
from Parthy as would have shaken any woman less determined and
less in love.
“Where you going with that fine husband of yours? Tell me that!”
“I don’t know.”
“I’ll warrant you don’t. No more does he. Why’re you going?
You’ve got a good home on the boat.”
“Kim . . . school . . .”
“Fiddlesticks!”
Magnolia took the plunge. “We’re not—I’m not—Gay’s not happy
any more on the rivers.”
“You’ll be a sight unhappier on land before you’re through, make
no mistake about that, young lady. Where’ll you go? Chicago, h’m?
What’ll you do there? Starve, and worse. I know. Many’s the time
you’ll wish yourself back here.”
Magnolia, nervous, apprehensive, torn, now burst into sudden
rebellion against the iron hand that had gripped her all these years.
“How do you know? How can you be so sure? And even if you
are right, what of it? You’re always trying to keep people from doing
the things they want to do. You’re always wanting people to live
cautiously. You fought to keep Papa from buying the Cotton Blossom
in the first place, and made his life a hell. And now you won’t leave
it. You didn’t want me to act. You didn’t want me to marry Gay. You
didn’t want me to have Kim. Maybe you were right. Maybe I
shouldn’t have done any of those things. But how do you know? You
can’t twist people’s lives around like that, even if you twist them
right. Because how do you know that even when you’re right you
mayn’t be wrong? If Papa had listened to you, we’d be living in
Thebes. He’d be alive, probably. I’d be married to the butcher,
maybe. You can’t do it. Even God lets people have their own way,
though they have to fall down and break their necks to find out they
were wrong. . . . You can’t do it . . . and you’re glad when it turns
out badly . . .”
She was growing incoherent.
Back of Parthy’s opposition to their going was a deep relief of
which even she was unaware, and whose existence she would have
denied had she been informed of it. Her business talent, so long
dormant, was leaping into life. Her energy was cataclysmic. One
would almost have said she was happy. She discharged actors, crew;
engaged actors, crew. Ordered supplies. Spoke of shifting to an
entirely new territory the following year—perhaps to the rivers of
North Carolina and Maryland. She actually did this, though not until
much later. Magnolia, years afterward reading her mother’s terse
and maddening letters, would be seized with a nostalgia not for the
writer but for the lovely-sounding places of which she wrote—though
they probably were as barren and unpicturesque as the river towns
of the Mississippi and Ohio and Big Sandy and Kanawha. “We’re
playing the town of Bath, on the Pamlico River,” Parthy’s letter would
say. Or, “We had a good week at Queenstown, on the Sassafras.”
Magnolia, looking out into the gray Chicago streets, slippery with
black ice, thick with the Lake Michigan fog, would repeat the names
over to herself. Bath on the Pamlico. Queenstown on the Sassafras.
Mrs. Hawks, at parting, was all for Magnolia’s retaining her
financial share in the Cotton Blossom, the money accruing therefrom
to be paid at regular intervals. In this she was right. She knew
Ravenal. In her hard and managing way she loved her daughter;
wished to insure her best interests. But Magnolia and Ravenal
preferred to sell their share outright if she would buy. Ravenal would
probably invest it in some business, Magnolia said.
“Yes—monkey business,” retorted Mrs. Hawks. Then added,
earnestly, “Now mind, don’t you come snivelling to me when it’s
gone and you and your child haven’t a penny to bless yourselves
with. For that’s what it’ll come to in the end. Mark my words. I don’t
say I wouldn’t be happy to see you and Kim back. But not him.
When he’s run through every penny of your money, he needn’t look
to me for more. You can come back to the boat; you and Kim. I’ll
look for you. But him! Never!”
The two women faced each other, and they were no longer
mother and daughter but two forces opposing each other with all
the strength that lay in the deep and powerful nature of both.
Magnolia made one of those fine speeches. “I wouldn’t come to
you for help—not if I were starving to death, and Kim too.”
“Oh, there’s worse things than starving to death.”
“I wouldn’t come to you no matter what.”
“You will, just the same. I’d take my oath on that.”
“I never will.”
Secretly she was filled with terror at leaving the rivers; for the
rivers, and the little inaccessible river towns, and the indolent and
naïve people of those towns whose very presence in them confessed
them failures, had with the years taken on in Magnolia’s eyes the
friendly aspect of the accustomed. Here was comfort assured; here
were friends; here the ease that goes with familiarity. Even her
mother’s bristling generalship had in it a protective quality. The very
show boat was a second mother, shielding her from the problems
and cares that beset the land-dweller. The Cotton Blossom had been
a little world in itself on which life was a thing detached, dream-like,
narcotic.
As Magnolia Ravenal, with her husband and her child, turned
from this existence of ease to the outside world of which she already
had had one bitter taste, she was beset by hordes of fears and
doubts. Yet opposing these, and all but vanquishing them, was the
strong love of adventure—the eager curiosity about the unknown—
which had always characterized her and her dead father, the little
captain, and caused them both to triumph, thus far, over the
clutching cautious admonitions of Parthenia Ann Hawks.
Fright and anticipation; nostalgia and curiosity; a soaring sense
of freedom at leaving her mother’s too-protective wing; a pang of
compunction that she should feel this unfilial surge of relief.
They were going. You saw the three of them scrambling up the
steep river bank to the levee (perhaps for the last time, Magnolia
thought with a great pang. And within herself a voice cried no! no!)
Ravenal slim, cool, contained; Magnolia whiter than usual, and
frankly tearful; the child Kim waving an insouciant farewell with both
small fists. They carried no bundles, no parcels, no valises. Ravenal
disdained to carry parcels; he did not permit those of his party to
carry them. Two Negroes in tattered and faded blue overalls made
much of the luggage, stowing it inefficiently under the seats and
over the floor of the livery rig which had been hired to take the three
to the nearest railway station, a good twelve miles distant.
The Cotton Blossom troupe was grouped on the forward deck to
see them off. The Cotton Blossom lay, smug, safe, plump, at the
water’s edge. A passing side-wheeler, flopping ponderously
downstream, sent little flirty waves across the calm waters to her,
and set her to palpitating coyly. Good-bye! Good-bye! Write, now.
Mis’ Means’ face distorted in a ridiculous pucker of woe. Ravenal in
the front seat with the driver. Magnolia and Kim in the back seat
with the luggage protruding at uncomfortable angles all about them.
Parthenia Ann Hawks, the better to see them, had stationed herself
on the little protruding upper deck, forward—the deck that
resembled a balcony much like that on the old Cotton Blossom. The
livery nags started with a lurch up the dusty village street. They
clattered across the bridge toward the upper road. Magnolia turned
for a last glimpse through her tears. There stood Parthenia Ann
Hawks, silhouetted against sky and water, a massive and almost
menacing figure in her robes of black—tall, erect, indomitable. Her
face was set. The keen eyes gazed, unblinking, across the sunlit
waters. One arm was raised in a gesture of farewell. Ruthless,
unconquerable, headstrong, untamed, terrible.
“She’s like the River,” Magnolia thought, through her grief, in a
sudden flash of vision. “She’s the one, after all, who’s like the
Mississippi.”
A bend in the upper road. A clump of sycamores. The river, the
show boat, the silent black-robed figure were lost to view.
XIII

T
he most casual onlooker could gauge the fluctuations of the
Ravenal fortunes by any one of three signs. There was
Magnolia Ravenal’s sealskin sacque; there was Magnolia
Ravenal’s diamond ring; there was Gaylord Ravenal’s malacca cane.
Any or all of these had a way of vanishing and reappearing in a
manner that would have been baffling to one not an habitué of
South Clark Street, Chicago. Of the three, the malacca stick, though
of almost no tangible value, disappeared first and oftenest, for it
came to be recognized as an I O U by every reputable Clark Street
pawnbroker. Deep in a losing game of faro at Jeff Hankins’ or Mike
McDonald’s, Ravenal would summon a Negro boy to him. He would
hand him the little ivory-topped cane. “Here—take this down to Abe
Lipman’s, corner Clark and Monroe. Tell him I want two hundred
dollars. Hurry.” Or: “Run over to Goldsmith’s with this. Tell him a
hundred.”
The black boy would understand. In ten minutes he would return
minus the stick and bearing a wilted sheaf of ten-dollar bills. If
Ravenal’s luck turned, the cane was redeemed. If it still stayed
stubborn, the diamond ring must go; that failing, then the sealskin
sacque. Ravenal, contrary to the custom of his confrères, wore no
jewellery; possessed none. There were certain sinister aspects of
these outward signs, as when, for example, the reigning sealskin
sacque was known to skip an entire winter.
Perhaps none of these three symbols was as significant a
betrayal of the Ravenal finances as was Gay Ravenal’s choice of a
breakfasting place. He almost never breakfasted at home. This was a
reversion to one of the habits of his bachelor days; was, doubtless, a
tardy rebellion, too, against the years spent under Mrs. Hawks’ harsh
régime. He always had hated those Cotton Blossom nine o’clock
family breakfasts ominously presided over by Parthy in cap and curl
papers.
Since their coming to Chicago Gay liked to breakfast between
eleven and twelve, and certainly never rose before ten. If the
Ravenal luck was high, the meal was eaten in leisurely luxury at Billy
Boyle’s Chop House between Clark and Dearborn streets. This was
most agreeable, for at Billy Boyle’s, during the noon hour, you
encountered Chicago’s sporting blood—political overlords, gamblers,
jockeys, actors, reporters—these last mere nobodies—lean and
somewhat morose young fellows vaguely known as George Ade,
Brand Whitlock, John McCutcheon, Pete Dunne. Here the news and
gossip of the day went round. Here you saw the Prince Albert coat,
the silk hat, the rattling cuffs, the glittering collar, the diamond stud
of the professional gamester. Old Carter Harrison, Mayor of Chicago,
would drop in daily, a good twenty-five-cent cigar waggling between
his lips as he greeted this friend and that. In came the brokers from
the Board of Trade across the way. Smoke-blue air. The rich heavy
smell of thick steaks cut from prime Western beef. Massive glasses
of beer through which shone the pale amber of light brew, or the
seal-brown of dark. The scent of strong black coffee. Rye bread
pungent with caraway. Little crisp round breakfast rolls sprinkled
with poppy-seed.
Calories, high blood pressure, vegetable luncheons, golf, were
words not yet included in the American everyday vocabulary. Fried
potatoes were still considered a breakfast dish, and a meatless meal
was a snack.
Here it was, then, that Gay Ravenal, slim, pale, quiet, elegant,
liked best to begin his day; listening charmingly and attentively to
the talk that swirled about him—talk of yesterday’s lucky winners in
Gamblers’ Alley, at Prince Varnell’s place, or Jeff Hankins’ or Mike
McDonald’s; of the Washington Park race track entries; of the new
blonde girl at Hetty Chilson’s; of politics in their simplest terms.
Occasionally he took part in this talk, but like most professional
gamblers, his was not the conversational gift. He was given credit
for the astuteness he did not possess merely on the strength of his
cool evasive glance, his habit of listening and saying little, and his
bland poker face.
“Ravenal doesn’t say much but there’s damned little he misses.
Watch him an hour straight and you can’t make out from his face
whether he’s cleaning up a thousand or losing his shirt.” An enviable
Clark Street reputation.
Still, this availed him nothing when funds were low. At such times
he eschewed Billy Boyle’s and breakfasted meagrely instead at the
Cockeyed Bakery just east of Clark. That famous refuge for the
temporarily insolvent was so named because of the optical
peculiarity of the lady who owned it and who dispensed its coffee
and sinkers. This refreshment cost ten cents. The coffee was hot,
strong, revivifying; the sinkers crisp and fresh. Every Clark Street
gambler was, at one time or another, through the vagaries of Lady
Luck, to be found moodily munching the plain fare that made up the
limited menu to be had at the Cockeyed Bakery. For that matter
lacking even the modest sum required for this sustenance, he knew
that there he would be allowed to “throw up a tab” until luck should
turn.
Many a morning Gaylord Ravenal, dapper, nonchalant, sartorially
exquisite, fared forth at eleven with but fifty cents in the pocket of
his excellently tailored pants. Usually, on these occasions, the
malacca stick was significantly absent. Of the fifty cents, ten went
for the glassy shoeshine; twenty-five for a boutonnière; ten for
coffee and sinkers at the Cockeyed Bakery. The remaining five cents
stayed in his pocket as a sop to the superstition that no coin breeds
no more coins. Stopping first to look in a moment at Weeping Willy
Mangler’s, or at Reilly’s pool room for a glance at the racing chart, or
to hear a bit of the talk missed through his enforced absence from
Boyle’s, he would end at Hankins’ or McDonald’s, there to woo
fortune with nothing at all to offer as oblation. But affairs did not
reach this pass until after the first year.
It was incredible that Magnolia Ravenal could so soon have
adapted herself to the life in which she now moved. Yet it was
explicable, perhaps, when one took into consideration her inclusive
nature. She was interested, alert, eager—and still in love with
Gaylord Ravenal. Her life on the rivers had accustomed her to all
that was bizarre in humanity. Queenie and Jo had been as much a
part of her existence as Elly and Schultzy. The housewives in the
little towns, the Negroes lounging on the wharves, the gamblers in
the river-front saloons, the miners of the coal belt, the Northern
fruit-pickers, the boatmen, the Southern poor whites, the Louisiana
aristocracy, all had passed in fantastic parade before her ambient
eyes. And she, too, had marched in a parade, a figure as gorgeous,
as colourful as the rest.
Now, in this new life, she accepted everything, enjoyed
everything with a naïveté that was, perhaps, her greatest charm. It
was, doubtless, the thing that held the roving Ravenal to her.
Nothing shocked her; this was her singularly pure and open mind.
She brought to this new life an interest and a curiosity as fresh as
that which had characterized the little girl who had so eagerly and
companionably sat with Mr. Pepper, the pilot, in the bright cosy
glass-enclosed pilot house atop the old Creole Belle on that first
enchanting trip down the Mississippi to New Orleans.
To him she had said, “What’s around that bend? . . . Now what’s
coming? . . . How deep is it here? . . . What used to be there? . . .
What island is that?”
Mr. Pepper, the pilot, had answered her questions amply and with
a feeling of satisfaction to himself as he beheld her childish hunger
for knowledge being appeased.
Now she said to her husband with equal eagerness: “Who is that
stout woman with the pretty yellow-haired girl? What queer eyes
they have! . . . What does it mean when it says odds are two to
one? . . . Why do they call him Bath House John? . . . Who is that
large woman in the victoria, with the lovely sunshade? How rich her
dress is, yet it’s plain. Why don’t you introduce me to——Oh! That!
Hetty Chilson! Oh! . . . Why do they call him Bad Jimmy Connerton?
. . . But why do they call it the Levee? It’s really Clark Street, and no
water anywhere near, so why do they call it the Levee? . . . What’s a
percentage game? . . . Hieronymus! What a funny word! . . . Mike
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.

Let us accompany you on the journey of exploring knowledge and


personal growth!

ebooknice.com

You might also like