Geostatistics

This page presents the main concepts for understanding and manipulating the geostatistics module at SPRING. The topics presented are:


The definitions and conventions adopted here follow those of geostatistics, that is, random functions and variables are denoted by capital letters (example: Z(x) and Z), observed values are represented by small letters (example: value of the random variable Z measured at position xk is z(xk) ) and vectors are represented in bold letters (example: {z(xi), i = 1, ..., n}, where xi identifies a two dimensional position represented by the coordinate pairs (xi , yi) ).


Check how to execute:

SPRING Procedures Sequence

See also:
Spatial Analysis
Digital Terrain Modeling
Digitalization of Maps



REGIONALIZED VARIABLES

The spatial variability of some soil characteristics is one of the researchers concerns since the beginning of the century. Smith (1910) studied field parcels disposition in experiments of corn yield, attempting the elimination of soil variation effects. Montgomery (1913), concerned with the effect of nitrogen in the wheat yield, built an experiment with 224 parcels, measuring the grain yield. Several other authors, like Waynick e Sharp (1919), also have studied variations of nitrogen and carbon in the soil.

The procedures were based on classical statistics, using large amount of sample data, aiming at the characterization or description of the spatial distribution of a characteristic under study. By classical statistics it is meant the one that uses parameters such mean and standard deviation to represent a phenomenon, and it is based on the main hypothesis that the variability from one place to another are random.

Krige (1951), working with gold concentration data, concluded that only the information given by the variance would not be enough to explain the phenomenon under study. As such, it would be necessary to take into account the distance among the observations. From that arises the concept of geostatistics, which considers de geographical location and the spatial dependence.

Matheron (1963, 1971), based on Krige´s observations, developed the regionalized variables theory, from the foundation of geostatistics.

Blais e Carlier (1968), quoted by Olea (1975), considers a regionalized variable as a numerical function with spatial distribution that varies from one point to another with an apparent continuity, but with variations that can not be represented by a simple mathematical function.

The regionalized variables theory assumes that the variation of a variable can be expressed by the sum of the components (Burrough, 1987): a) a structural component, associated to a constant mean value or a constant trend; b) a random component, spatially correlated; and c) a random noise or residual error.

Let x represents a position in one, two or three dimensions. Then, the value of the variable Z, in x, is given by (Burrough, 1987):

Z(x) = m(x) + e ¢ (x) + e ² (1)

where:

  • m(x) is a deterministic function that describes the structural component of Z in x;
  • e ¢ (x) is a stochastic term, that varies locally and depends spatially on m(x);
  • e ² is a non-correlated random noise, Gaussian distributed with zero mean and variance s2.

Figure below illustrates the three main components of the spatial variation. Part (a) presents a deterministic component that varies abruptly, while the deterministic component in part (b) presents a constant trend.

 



Figs. (a) e (b) - Main components of the spatial variation.
FONTE: Modified from Burrough (1987), p. 155.


Considered Hypothesis

Unlike conventional estimation methods, kriging is based on the theory of regionalized variables. The fist step in kriging is to define a suitable function for the deterministic component m(x). As such, some hypothesis are necessary (Burrough, 1987 e David, 1977):

  • Second-Order Stationarity Hypothesis

Under this hypothesis, it is assumed the deterministic component, m(x), is constant (there is no trend in the region). Then, m(x) is equal to the expected value of the random variable Z at the position x, and the mean difference between the observed values at x and x+h, separated by a distance vector h (modulus and direction) is null.

E[Z(x) - Z(x+h)] = 0 or E[Z(x)] = E[Z(x+h)] = m(x) = m (2)

where

E represents the mathematical expectation operator.

It is also assumed that the covariance between the pairs Z(x) and Z(x+h), separate by a distance vector h, exists and depends only on h. Then:

C(h) = Cov [Z(x), Z(x+h)] =

= E[(Z(x)-m).(Z(x+h)- m)] = E[Z(x).Z(x+h)]-m2, " x; (3)

where

Cov [Z(x), Z(x+h)] is the covariance between Z(x) and Z(x+h).

From Equation (3), stationarity of the covariance implies in stationarity of the variance:

Var[Z(x)] = E{[Z(x)- m]2} = E[Z2(x)] - 2.E[Z(x)].m + m2 =

= E[Z(x).Z(x+0)] - 2m2 + m2 =

= E[Z(x).Z(x+0)] - m2 = C(0), " x. (4)

where

Var is the variance operator.

The stationarity of the covariance also implies in the stationarity of the variogram, defined by:

2g (h) = E{[Z(x)-Z(x+h)]2} (5)

Equation (5) can be written as:

2g (h) = E{Z2(x) - 2 Z(x)Z(x+h) + Z2(x+h)}

= E[Z2(x)] - 2E[Z(x)Z(x+h)] + E[Z2(x+h)] (6)

From Equation (3) one can get:

E[Z(x)Z(x+h)] = C(h) + m2 (7)

In an analogous manner, from Equation (4) we have:

E[Z(x).Z(x+0)] = E[Z2(x)] = C(0) + m2 (8)

Substituting Equations (7) and (8) in Equation (6), one can get:

2g (h) = C(0) + m2 - 2 (C(h) + m2) + C(0) + m2 =

= 2 C(0) - 2 C(h) (9)

Simplifying Equation (9), we have:

g (h) = C(0) - C(h) (10)

where:

g (h) represents a function known in the theory of regionalized variables as semivariogram, which is half of the variogram. See discussion about variogram.

The relation represented in Equation (10) indicates that under the second-order stationarity hypothesis, the covariance and the semivariogram are two alternative forms of characterizing the autocorrelation of the pairs Z(x) e Z(x+h) separate by the vector h.

The second-order stationarity hypothesis assumes the existence of a covariance and, therefore, of a finite variance (Equation 4). Under this condition, the correlogram, r(h), can be defined. By dividing both sides of Equation (10) by C(0), we have:

r (h) = = 1- (11)

 

The restrictions imposed to the second-order stationarity, that is, assuming that $ C(h) Þ $ Var[Z(x)] = C(0) and also Þ $ g (h), may not be satisfied by some physical phenomenon that have a infinite dispersion capacity (David, 1977). Infinite dispersion capacity Þ C(h), Var[Z(x)]; however, g (h) may exist. For such situations, a less restrictive hypothesis, the intrinsic hypothesis, can be applicable.

  • Intrinsic Stationarity Hypothesis

Analogously to the previous hypothesis, it is assumed that E[Z(x)] = m(x) = m, " x. Besides, it is assumed that the variance of the differences depends only on the distance vector h, that is:

Var[Z(x) - Z(x+h)] = E{[Z(x)-Z(x+h)]2} = 2g (h) , (12)

where

2g (h) is as presented before.

According to David (1977), this hypothesis is more frequent in geostatistics, mainly because it is less restrictive. That is, it requires only the existence and the stationarity of the variogram, with no restriction about the existence of a finite variance.

An additional consideration, that transcends the scope of this work, refers to the Universal Kriging hypothesis (David, 1977). In this case, m(x) is the drift (main trend) and assumes that C(h) and g (h) have stationarity inside a neighborhood of a restricted size. Moreover, it is assumed that E[Z(x)] = m(x), which is not stationary, varying in a regular way inside such neighborhood. According to David (1977), not only the covariance and the variogram are defined from the experimental values, but also the size of the neighborhood where the hypothesis maintains valid. Topics about this subject can be found in Olea (1975, 1977), and an application example can be seen in Burgess and Webster (1980c).

In this work, it is assumed the second-order stationarity (Þ intrinsic hypothesis), which is enough for using the simple kriging estimation methods (SK) and the ordinary kriging (OK), to be discussed ??????.

Regionalized Variables Characteristics

According to Olea (1975, 1977), the main characteristics of a regionalized variable are:

  • Location: a regionalized variable is numerically defined by a value, which is associated to a sample of specific size, shape and orientation. These geometrical characteristics of the sample are denominated geometrical support. The geometrical support does not necessarily comprise volumes, and can also refer to areas and lines. When the geometrical support tends to zero, one has a point or a sample point and the geometrical support is immaterial. Example: in the study of soil water saturation variation, samples of 10cm3 are collected. The regionalized variable is soil moisture and the geometrical support is the sample volume (10cm3). Note that, in this experiment, the amount of water in the soil depends not only on the sample location but also on its size, shape and orientation. A sample of cylindrical and lengthy shape, taken vertically, has more water than a sample of same size and form taken horizontally in relation to the soil surface. If the sample volume is 10m3 instead of 10cm3 , the result will be also different. In summary, the theory of regionalized variables considers the sample geometry, differently from the classical statistics where the shape, size and orientation are not considered. A classical statistical experiment such as the throwing a coin has results that does not depend whether the coin is big or small, heavy or light, and how it was thrown.
  • Anisotropy: some regionalized variables are anisotropic, that is, they present gradual variations in one direction and fast or irregular in other direction.
  • Continuity: depending on the phenomenon being observed, the spatial variation of a regionalized variable can be high or small. In spite of the complexity of the fluctuations, an average continuity is generally present. This continuity is exemplified by Olea (1975) in a hypothetic case, where soil samples of same size, shape and orientation are collected at regular intervals along imaginary lines. These samples may originate two distinct series for the percentage of H2O (water), as presented in Table below.

TABLE - PERCENTAGE OF H2O IN TWO DISTINCT SAMPLES A AND B.

A

5

10

15

20

25

20

15

10

5

% H2O

B

10

25

15

10

20

5

15

5

20

% H2O

In this table, the individual values in the two samples are exactly the same. Therefore, the sample mean and sample variance, as well as the histogram of the observed variable in samples A and B, are exactly identical. Any analysis that does not take into consideration other statistics beside the mean, variance and histogram will not differentiate the two series. This example emphasizes the importance of the regionalized variable spatial continuity measurement. Therefore, it is necessary to consider the relative spatial position of each observation in the two samples, in order to be possible the differentiation between them. The regionalized variable spatial continuity can be analyzed from the variogram, as described next.

 



See also:
Variogram
Kriging