Spatial Relationships Between Two Georeferenced Variables With Applications in R Optimized EPUB Download
Spatial Relationships Between Two Georeferenced Variables With Applications in R Optimized EPUB Download
With Applications in R
Visit the link below to download the full version of this book:
https://medipdf.com/product/spatial-relationships-between-two-georeferenced-vari
ables-with-applications-in-r/
Moreno Bevilacqua
Spatial Relationships
Between Two Georeferenced
Variables
With Applications in R
123
Ronny Vallejos Felipe Osorio
Department of Mathematics Department of Mathematics
Federico Santa María Technical University Federico Santa María Technical University
Valparaíso, Chile Valparaíso, Chile
Moreno Bevilacqua
Faculty of Engineering and Sciences
Universidad Adolfo Ibañez
Viña del Mar, Chile
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To my lovely wife and son,
Carmen and Ronny Javier.
Ronny Vallejos
In this book we cover a wide range of topics that currently are available only as a
material included in many research papers. The material we cover is related to 35
years of research in spatial statistics and image processing. Our approach includes
an exposition of the techniques keeping the mathematical and statistical background
at a minimum so that the technical aspects are placed in an appendix in order to
facilitate the readability. Each chapter contains a section with applications and R
computations where real datasets in different contexts (Fisheries Research, Forest
Sciences, and Agricultural Sciences) are analyzed.
We trust that the book will be of interest to those who are familiar with spatial
statistics and to scientific researchers whose work involves the analysis of geosta-
tistical data. For the first group, we recommend a fast reading of Chap. 1 and then the
chapters of interest. For the second group, the preliminaries given in Chap. 1
are recommended as a prerequisite, especially because of the language and notation
used further in the book. The interdependence of the chapters is depicted below,
where arrow lines indicate prerequisites.
Extensive effort was invested in the composition of the reference list for each
chapter, which should guide readers to a wealth of available materials. Although
our reference lists are extensive, many important papers that do not fit our pre-
sentation have been omitted. Other omissions and discrepancies are inevitable. We
apologize for their occurrence.
Many colleagues, students, and friends have been of great help to our work in
this book in several ways: by having discussions that improve our understanding of
specific subjects; by doing research with us in a number of collaborative projects;
by providing constructive criticism on earlier versions of the manuscript; and by
supporting us with enthusiasm to finish this project. In particular, we would like to
thank Aaron Ellison, Daniel Griffith, Andrew Rukhin, Wilfredo Palma, Manuel
Galea, Emilio Porcu, Pedro Gajardo, Jonathan Acosta, Silvia Ojeda, Javier Pérez,
Francisco Alfaro, Rogelio Arancibia, Carlos Schwarzenberg, Angelo Gárate, and
Macarena O’Ryan.
vii
viii Preface
Chapter 2
Chapter 3
Chapter 1
Chapter 4
Chapter 8
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivating Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 The Pinus Radiata Dataset . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 The Murray Smelter Site Dataset . . . . . . . . . . . . . . . . . . . 3
1.1.3 Similarity Between Images . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Objective of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Layout of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Preliminaries and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5.1 Spatial Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5.2 Intrinsic Stationary Processes and the Variogram . . . . . . . . 10
1.5.3 Estimation of the Variogram . . . . . . . . . . . . . . . . . . . . . . 15
1.5.4 Kriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.5 The Cross-Variogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5.6 Image Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.6 Problems for the Reader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2 The Modified t Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 The Modified t-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Estimation of the Effective Sample Size . . . . . . . . . . . . . . . . . . . . 29
2.4 Applications and R Computations . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.1 Application 1: Murray Smelter Site Revisited . . . . . . . . . . 33
2.4.2 Application 2: Modified t-Test Between Images . . . . . . . . 35
2.5 A Permutation Test Under Spatial Correlation . . . . . . . . . . . . . . . 37
2.5.1 Application 3: Permutation t-Test Between Images . . . . . . 39
ix
x Contents
The types of spatial data we describe in the following examples have been widely
discussed in Cressie (1993) and Schabenberger and Gotway (2005) in the context
of a single realization of a stochastic sequence. Since the book addresses spatial
association between two stochastic sequences, we consider some basic assumptions
that do not vary throughout the book. For example, we denote the two random
sequences as X (s) and Y (s) for s ∈ D ⊂ R2 , and the available information is the
observations X (s1 ), . . . , X (sn ) and Y (s1 ), . . . , Y (sn ). That is, both variables have
been measured at the same locations in space.
Pinus radiata, which is one of the most widely planted species in Chile, is planted in
a wide array of soil types and regional climates. Two important measures of plan-
tation development are dominant tree height and basal area. Research shows that
these measures are correlated with the regional climate and local growing condi-
tions (see Snowdon 2001). The study site is located in the Escuadrón sector, south
of Concepción, in the southern portion of Chile (36◦ 54 S, 73◦ 54 O) and has an
area of 1244.43 hectares. In addition to mature stands, there is also interest in areas
that contain young (i.e., four years old) stands of Pinus radiata. These areas have an
average density of 1600 trees per hectare. The basal area and dominant tree height in
the year of the plantation’s establishment (1993, 1994, 1995, and 1996) were used to
represent stand attributes. These three variables were obtained from 200 m2 circular
sample plots and point-plant sample plots. For the latter, four quadrants were estab-
lished around the sample point, and the four closest trees in each quadrant (16 trees
in total) were then selected and measured. The samples were located systematically
using a mean distance of 150 m between samples. The total number of plots available
5913000
5912000
5911000
5910000
5909000
5908000
5913000
5
2
6
12
8
6
6
6
2
7
5912000
5912000
4
10
4
4
5
4
4
6
4
10 6
6 6
2
6
5
5911000
5911000
4
4
4 4
4
2 8
6
4
6
8 6
3
4
2
6
6
2
6
5
y
8 2
4
5
5910000
5910000
4 6 6
2
6
8
6
4
4
5
6
6
7
4
4
8
4
4 6
4
4
5909000
5909000
6 6
5
6
4 3
2
6 6
6
6 6
5908000
5908000
6
6
5
4
666000 667000 668000 669000 670000 666000 667000 668000 669000 670000
x x
(a) (b)
Fig. 1.2 a Bilinear interpolation of the three basal areas; b Bilinear interpolation of the three heights
for this study was 468 (Fig. 1.1). Figure 1.2 shows a simple bilinear interpolation and
the corresponding contours for the two variables. The original georeferenced data do
not enable estimation of the sample correlation coefficient because it is challenging
to train the human eye to capture two-dimensional patterns.
The objective of analyzing these data is to construct a suitable measure that takes
into account the spatial association between the two variables. One could be tempted
to compute the Pearson correlation coefficient for the two sequences by considering
these variables to be two simple columns. Then, the construction of a scatterplot could
1.1 Motivating Examples 3
8
7
6
Height
5
4
3
2 4 6 8 10 12
Basal Area
help to determine whether there is a linear trend between the basal area and height.
It is interesting to emphasize that the human eye can usually be trained to estimate
the value of the correlation coefficient from the information provided by a scatterplot
between the variables of interest. However, when the data have been georeferenced
on two-dimensional space, it is difficult to estimate a reasonable association between
the variables. For the forest variables, a scatterplot between the basal area and height
is displayed in Fig. 1.3, which shows a clear linear correlation between the basal area
and height. The correlation coefficient confirms the linear pattern (0.7021).
Although the exploratory data analysis provides good initial insight into the real
problem, the issue of how to take into account the possible spatial association for
each variable has not yet been addressed. Thus, a primary objective in analyzing these
data is to develop coefficients for the spatial association between two georeferenced
variables that take into account the existing spatial association within and between
the variables.
The dataset consists of soil samples collected in and around the vacant, industrially
contaminated, Murray smelter site (Utah, USA). This area was polluted by airborne
emissions and the disposal of waste slag from the smelting process. A total of 253
locations were included in the study, and soil samples were taken from each location.
Each georeferenced sample point is a pool composite of four closely adjacent soil
samples in which the concentration of the heavy metals arsenic (As) and lead (Pb) was
determined. A detailed description of this dataset can be found in Griffith (2003) and
4 1 Introduction
(a) (b)
Fig. 1.4 Locations of 253 geocoded aggregated surface soil samples collected in a 0.5 square mile
area in Murray, Utah and their measured concentrations of As and Pb. Of these 173 were collected
in a facility Superfund site, and 80 were collected in two of its adjacent residential neighborhoods
located along the western and southern borders of the smelter site. a As measurements; b Pb
measurements
Griffith and Paelinck (2011). For each location, the As and Pb attributes are shown
in Fig. 1.4a, b.
The objective for this data is to assess the spatial association between As and
Pb. Figure 1.4 shows that, in this case, the observations are clearly located in a
nonrectangular grid in two-dimensional space. Again, the goal can be achieved by
quantifying the coefficients of the spatial association or by constructing a suitable
hypothesis test for the Pearson correlation coefficient ρ between As and Pb.
A hypothesis test of the form
H0 : ρ = 0 against H1 : ρ = 0
can be stated under the assumption of normality for both variables (As and Pb). Then,
the test statistic is √
n−2
t =r√ = 11.5548, (1.1)
1 − r2
where n = 253 and r = 0.5892. The p-value associated with the test is 2.2 × 10−16 ;
thus, there is sufficient evidence to reject H0 for a significance α > p.
In the previous analysis, we assumed that the correlation between the variables
is constant, i.e., cor[X (s), Y (s)] = ρ, for all s ∈ D. However, as we will see in the
following chapters, this dataset and several others do not support this restriction.
Instead, they exhibit a clear spatial association between the variables of interest.
1.1 Motivating Examples 5
With the rapid proliferation of digital imaging, image similarity assessment has
become a fundamental issue in many applications in various fields of knowledge
(Martens and Meesters 1998). Many proposals of indices that capture the similarity
or dissimilarity between two digital images have received attention during the past
decade. One important feature to consider is the capability of some coefficients to
provide a better interpretation of human visual perception than is provided by the
widely used mean square error (MSE) (Wang et al. 2004).
Here, we introduce an example that uses real data to illustrate the dependence of
the spatial association on a particular direction in space, noting that the correlation
coefficient (a crude measure of spatial association between two processes) cannot
account for the directional association between two images. To accomplish this goal,
an original image (Lenna) of size 512 × 512 was taken from the USC-SIPI image
database http://sipi.usc.edu/database/ (See Fig. 1.5a). The image shown in Fig. 1.5a
was processed by Algorithm 4.1 in Vallejos et al. (2015) to transform the original
image into an image with a clear pattern in the direction h = (1, 1). The processed
image is displayed in Fig. 1.5b.
The correlation coefficient between the images shown in Fig. 1.5 is r = 0.6909.
Clearly, the correlation coefficient does not capture the evident pattern observed by
the human eye between the original and transformed images. In fact, the trend in
the off-diagonal of the image in Fig. 1.5b is sufficient to decrease the correlation
(a) (b)
Fig. 1.5 a Original image (Lenna); b Image transformed into the direction h = (1, 1)
6 1 Introduction
coefficient to 0.6909 even though the features of the original image are still present
and detectable by the human eye.
The objective in analyzing these data is to construct image similarity coefficients
that can detect patterns in different directions in space and to appropriately represent
the human visual system.
The aim of this book is to gather the published material that is spread throughout
the literature. The book may be of interest to two types of users. First, researchers
from applied areas, such as agriculture, soil sciences, forest sciences, environmental
sciences, and engineering. For these and other users who possibly are more interested
in the applications, the book is organized in such a way that the mathematical foun-
dations in each chapter can be skipped. Second, for investigators who are interested
in the development of new techniques and methods to assess the significance of the
correlation between two or more spatial processes that are well defined on a two-
dimensional plane, at the end of the book, we include an appendix with the proofs
of the results presented in the book and some mathematical details that support the
expressions and equations that are briefly explained in the main text. Although the
book contains methods that were discovered and proposed approximately thirty years
ago and are very well known to readers working in spatial statistics and geostatistics,
other methods in this book have recently been developed and are not yet available in
a publication like this.
This book is divided into three parts. The first part considers the association between
two random fields from an hypothesis testing perspective (Chaps. 2 and 3). The sec-
ond part is devoted to point estimation coefficients of association. These perspectives
are developed in Chaps. 4–7. The third part considers the spatial association between
two images (Chap. 8). Several applications are presented throughout the book. Each
chapter ends with a set of theoretical and applied exercises. Most of the applied
problems are related to real datasets, and it is expected that the reader will use R
software to solve them.
1.4 Computation
To illustrate the applicability of the methods exposed in this book, each chapter
contains a section on R computations with practical applications. In most of the
examples, we show how R software and the contributed packages SpatialPack and
1.4 Computation 7
In this section, we provide the necessary material that will be used in subsequent
chapters. Readers interested in practical applications with real datasets can skip the
rest of this chapter and move ahead.
In this section, we introduce the basic notion of stochastic processes. Our goal is
to define the mean, variance, and covariance functions of spatial processes. These
concepts will be used in the subsequent chapters.
A stochastic process is a family or collection of random variables in a probability
space. Let (Ω, F, P) be a probability space, and let D be an arbitrary index set. A
stochastic process is a function
X : (Ω, F, P) × D −→ R,
such that for all s ∈ D, X (w, s) is a random variable. In the sequel, we will denote
a stochastic process as X (s), for s ∈ D, or {X (s) : s ∈ D}.
The above definition enables us to define a variety of processes. For example,
if D = Z, X (s) is a discrete time series. Similarly, a spatial process is a collection
of random variables that are indexed by the set D ⊂ Rd . In the time series case,
the realizations of the process are observations indexed by time, while in the spatial
case, the realizations of the process are regions on the subspace Rd . Additionally,
in the first case, the index set is a totally ordered set; however, in the spatial case, it
is possible to define a partially ordered set. We denote the coordinates of a spatial
process defined on a space of dimension d as s = (s1 , . . . , sd ) .
As an example, consider the process {X (s) : s ∈ Z} defined by the equation