![]() Spatial Statistics –
Point Pattern Analysis
The spatial data analysis
consists in available data observation in space and trying somehow, using
modeling techniques and methods, to describe and explain the spatial process
behavior and their relationship with some other spatial phenomenon. In the point patterns
analysis case the data are points related to some events. For instance, health
problems, volcanic crater center, biological cells. In the points
pattern analysis, only the points location is considered, different from the geostatistics, where the attributes are related to the
sample, point, are important. In the ecology area, consider
that one wants to analyze the spatial distribution of plants inside small
areas (see the Figure bellow).
Consider that
epidemiologists collect data about a disease. Analyzing and visualizing data,
one can verify that the density of this disease is higher closer to a well
serving the region. Can this grouping evidence indicate that the well is
contaminated? The basic purpose of a
point pattern analysis is to verify if the observed events in a given study
region presents a systematic behavior, as, for instance, grouping, regularity
or randomness
The SPRING presents, in
this version, two univariate point patterns
analysis procedures, the nearest neighbor distance method and the L
function method. These two methods analyze data
properties, known as second order or spatial dependencies. The component of
the second order is responsible by stochastic deviations related to the
average and, on the contrary of assuming these independent
spatial deviation, it is possible to consider a covariance spatial
structure or a process spatial dependency. These second order component is
modeled as a spatial process, stationary and isotropic. A spatial process { Y(s),
s Î R} is stationary and
homogeneous if its statistical properties, average and variance, are
constants in the region R and, thus, they don't depend on the
location, s. Stationarity, besides this, suggests that the covariance matrix,
among other values from any two places si and sj,
depending on, exclusively, the direction and distance among them and not from
their absolute values. If, besides this, the covariance matrix process is
independent of the direction, then, there is an isotropic stationary process. In an isotropic process
there is a narrow relationship between the distance distribution among events
and the second order properties. The nearest neighbor
distance is a measure that takes into account second order properties. One
way to verify the spatial dependency degree in a points
pattern is to observe the accumulated distances distribution behavior. Nearest Neighbor – the nearest neighbor method considers the G(w)
estimate, as a distance accumulated distribution between any random selected
event and the nearest neighbor event. For the univariate
analysis, the nearest neighbor estimate is reduced to: Where: #
= number The G(w)
by "w" empirical resulting plots, can be used as an inference
method to see if some interaction evidence among events. If the plot shows a
function with an abrupt elevation at the beginning, may suggest a grouping in
a considered scale. If this elevation happens at greater intervals, towards
the end of the curve, this suggests a repulsion or regularity among the
events. The nearest neighbor can
be used as a formal method for statistical comparison of the events observed
distribution. Considering the Complete Spatial Randomness (CSR) hypotheses
this was expected. This case corresponds to the Nearest neighbor simulation option. The standard spatial
model for Complete Spatial Randomness is the one where the events
follows a Poisson homogeneous process in the study region. This means
that the point spatial process described, consider the Y(Ai)
and Y(Aj) independent random
variables for any choice of Ai and Aj
and that the probability distribution for Y(A) follows the Poisson
distribution with l A average, where
A is the A area and l the average number of events per area. Besides
this, considering the total number of events in R, events are independent and
uniformly distributed in R. This means that any event can be true with the
same probability, in any position, and the position of any event is
independent of any other position, there is no interaction among events. It is possible to
simulate 'n' events using the uniform distribution inside the region, and
formulate hypotheses to test if the observed patterns are grouped regularly
or randomly. The method consists in
simulate "envelopes" for the CSR distribution to evaluate the
output data significance. The estimated simulation for G(w)
under the CSR hypothesis is U(w) = max
The nearest neighbor
method is based on distances to closer events and, thus, the pattern smaller
scales are considered. In order to get more effective information from a
spatial pattern embracing large scale intervals, the best method is the L
function, which will be described next. L function – gives a more effective description of spatial dependencies in a
larger scale interval and it is related to second order properties in an
isotropic process. Thus, assuming an isotropic process over all region, the L
function is defined for an univariate process as: l K(h) = E(# of events of the h distance
of an arbitrary event), where: # = number of E( ) = the expected value
operator l = is the intensity or the average
number of events per area unit, and a region assumed as constant. The L function can be
estimated by K(h) = In order to understand this function consider that each event is visited and
around each event concentrically spaced circles are built. The accumulated
number of events inside each circle is counted. All events are visited and
the number of events inside an h distance of all events is computed and
becomes the estimate of the L function, which is weighted by
When an homogeneous
process without spatial dependencies is considered, K(h) = ph2 . Thus, under grouping, it is
expected that K(h) >= ph2 and, in case of regularity, K(h)
<= p h2. In order to facilitate
the graphical interpretation of the L function, which is not as intuitive as
the nearest neighbor, a simplified formula is used The L function method has
advantages when compared to the nearest neighbor method. It presents
information in several pattern scales, it considers
the event precise location and includes all event-event distances. Another
reason to use the theoretical form of the K(h)
function is known for several point models. Thus, When considering the L function
with simulation, as in the previous case, superior and inferior
envelopes are built for 'm' simulations of 'n' events in the region, under
the complete spatial Randomness (CSR) hypothesis and the associated estimates
of Pr(
Some of the graphs
generated in the SPRING system, which were mentioned in the text are
presented and interpreted.
Notes: Ideally the user should observe
previously the points distribution, to define
coherently the maximum and minimum distances. It is possible to activate the
coordinates in the main menu.
The
number of intervals define the number of
concentrically circles where the points in which the event distances has a
maximum value equals the radius will be. The application which will generate
the plotting always presents 10 coordinates in X and Y, although the number
of intervals selected is different than 10. Notice that a number greater than
the interval gives a more detailed graph.
![]() |