Spatial Statistics – Point Pattern Analysis

The spatial data analysis consists in available data observation in space and trying somehow, using modeling techniques and methods, to describe and explain the spatial process behavior and their relationship with some other spatial phenomenon.

In the point patterns analysis case the data are points related to some events. For instance, health problems, volcanic crater center, biological cells.

In the points pattern analysis, only the points location is considered, different from the geostatistics, where the attributes are related to the sample, point, are important.

In the ecology area, consider that one wants to analyze the spatial distribution of plants inside small areas (see the Figure bellow).


The data presents the location for 62 germinated seeds, distributed in a region with 23m2. From the ecological point of view, some grouping evidence is expected related to the same seed species.

Consider that epidemiologists collect data about a disease. Analyzing and visualizing data, one can verify that the density of this disease is higher closer to a well serving the region. Can this grouping evidence indicate that the well is contaminated?

The basic purpose of a point pattern analysis is to verify if the observed events in a given study region presents a systematic behavior, as, for instance, grouping, regularity or randomness


    


Any data spatial analysis is related to a set of analysis methods which can be divided into methods related to data visualization, methods called exploratory and those centralized in the statistic model specification and in the parameters estimates.

The SPRING presents, in this version, two univariate point patterns analysis procedures, the nearest neighbor distance method and the L function method. These two methods analyze data properties, known as second order or spatial dependencies. The component of the second order is responsible by stochastic deviations related to the average and, on the contrary of assuming these independent spatial deviation, it is possible to consider a covariance spatial structure or a process spatial dependency. These second order component is modeled as a spatial process, stationary and isotropic.

A spatial process { Y(s), s Î R} is stationary and homogeneous if its statistical properties, average and variance, are constants in the region R and, thus, they don't depend on the location, s. Stationarity, besides this, suggests that the covariance matrix, among other values from any two places si and sj, depending on, exclusively, the direction and distance among them and not from their absolute values. If, besides this, the covariance matrix process is independent of the direction, then, there is an isotropic stationary process.

In an isotropic process there is a narrow relationship between the distance distribution among events and the second order properties.

The nearest neighbor distance is a measure that takes into account second order properties. One way to verify the spatial dependency degree in a points pattern is to observe the accumulated distances distribution behavior.

Nearest Neighbor – the nearest neighbor method considers the G(w) estimate, as a distance accumulated distribution between any random selected event and the nearest neighbor event. For the univariate analysis, the nearest neighbor estimate is reduced to:

Where:

# = number
n = number of events in this area
wi= measures distance between events
w = comparison distance selected by the user

The G(w) by "w" empirical resulting plots, can be used as an inference method to see if some interaction evidence among events. If the plot shows a function with an abrupt elevation at the beginning, may suggest a grouping in a considered scale. If this elevation happens at greater intervals, towards the end of the curve, this suggests a repulsion or regularity among the events.

The nearest neighbor can be used as a formal method for statistical comparison of the events observed distribution. Considering the Complete Spatial Randomness (CSR) hypotheses this was expected. This case corresponds to the Nearest neighbor simulation option.

The standard spatial model for Complete Spatial Randomness is the one where the events follows a Poisson homogeneous process in the study region. This means that the point spatial process described, consider the Y(Ai) and Y(Aj) independent random variables for any choice of Ai and Aj and that the probability distribution for Y(A) follows the Poisson distribution with l A average, where A is the A area and l the average number of events per area. Besides this, considering the total number of events in R, events are independent and uniformly distributed in R. This means that any event can be true with the same probability, in any position, and the position of any event is independent of any other position, there is no interaction among events.

It is possible to simulate 'n' events using the uniform distribution inside the region, and formulate hypotheses to test if the observed patterns are grouped regularly or randomly.

The method consists in simulate "envelopes" for the CSR distribution to evaluate the output data significance. The estimated simulation for G(w) under the CSR hypothesis is (w) = ,wherei(w), i =1,...n are the distribution functions estimated to border correction. Each of the 'n' estimated functions corresponds to one simulation and, for each simulation, 'm' independent events with uniform distribution are generated. The superior and inferior envelopes are defined as:

U(w) = max{ (w)} and L(w) = min{(w)}


If the data is compatible with the CSR distribution the obtained results when plotting the simulated function
(w) by the acquired accumulated function from the observations, (w), has to be a function closer to a 45 degrees linear function. If there is a grouping, the observed function (w) has to be above the 45 degree straight line and, if there is regularity, (w) should be bellow the 45 degree straight line.

The nearest neighbor method is based on distances to closer events and, thus, the pattern smaller scales are considered. In order to get more effective information from a spatial pattern embracing large scale intervals, the best method is the L function, which will be described next.

L function – gives a more effective description of spatial dependencies in a larger scale interval and it is related to second order properties in an isotropic process. Thus, assuming an isotropic process over all region, the L function is defined for an univariate process as:

l K(h) = E(# of events of the h distance of an arbitrary event), where:

# = number of

E( ) = the expected value operator

l = is the intensity or the average number of events per area unit, and a region assumed as constant.

The L function can be estimated by K(h) =

In order to understand this function consider that each event is visited and around each event concentrically spaced circles are built. The accumulated number of events inside each circle is counted. All events are visited and the number of events inside an h distance of all events is computed and becomes the estimate of the L function, which is weighted by , ignoring the border effect ().

When an homogeneous process without spatial dependencies is considered, K(h) = ph2 . Thus, under grouping, it is expected that K(h) >= ph2 and, in case of regularity, K(h) <= p h2.

In order to facilitate the graphical interpretation of the L function, which is not as intuitive as the nearest neighbor, a simplified formula is used = - h . In the plot by h, positive peaks indicate spatial attraction or grouping and negative peaks indicate repulsion or regularity.

The L function method has advantages when compared to the nearest neighbor method. It presents information in several pattern scales, it considers the event precise location and includes all event-event distances. Another reason to use the theoretical form of the K(h) function is known for several point models. Thus, , is not only used to explore the spatial dependencies, but considering the available models capable to represent this dependency and the model parameters estimate.

When considering the L function with simulation, as in the previous case, superior and inferior envelopes are built for 'm' simulations of 'n' events in the region, under the complete spatial Randomness (CSR) hypothesis and the associated estimates of . The envelopes are included in the L function graph by h. The peaks and depressions significance can be considered based on

Pr( > U(h)) = Pr (< L(h)) = , which gives the m value to be used, that is, how many simulations has to be done to detect the lack of randomization to a specific significance level. In the example bellow one can verify that the grouping hypotheses tested by the L function, is confirmed in this standard test once the curve , is placed above the superior envelope. The simulated superior and inferior envelopes are calculated by the formulas

Some of the graphs generated in the SPRING system, which were mentioned in the text are presented and interpreted.

  1. Nearest Neighbor - The cumulative distribution by Distance graph: if the curve grows faster at the beginning and stabilize later in a given value, this means an interaction between events or clustering. If the curve grows faster at the end, this means repulsion among the events characterizing regularity in the distribution for the analyzed distances.


Configure your Chart

  1. Nearest Neighbor with Simulation – Estimated function 'G' by the average simulated function 'G:' graph. Curves ('G' estimates, maximum and minimum envelopes) above 45o indicates aggregation. Curves bellow the 45o indicates regularity.


Configure your Chart

  1. L function – The 'L' function estimate by Distance graph: positive values mean aggregation and negative values mean regularity. Positive extreme values corresponds the distance where the aggregation is more emphasized, while negative extremes reflect distance values where the repulsion among events is stronger.


Configure your Chart

  1. L function with simulation - the 'L' estimated functions and envelopes by Distance graph: positive extremes of the L function, above the envelopes indicates aggregation and negative extremes represents regularities for the respective distances.


Configure your Chart


Executing Points Spatial Analysis:

  • select in the "Control Panel" an IL from the Thematic category
  • click on Analysis; next click on Spatial Statistics and then select UnivariateAnalysis of Points . The "Point Patterns" window is presented.
  • select one of the Nearest Neighbor, Nearest Neighbor with Simulation, L function or L function with Simulation methods – .
  • type in the text box the Minimum Distance (minimum distance selected among the events), Maximum Distance (maximum distance selected among events), Number of Intervals (The number of intervals – define the width of each distance interval).

Notes: Ideally the user should observe previously the points distribution, to define coherently the maximum and minimum distances. It is possible to activate the coordinates in the main menu. The number of intervals define the number of concentrically circles where the points in which the event distances has a maximum value equals the radius will be. The application which will generate the plotting always presents 10 coordinates in X and Y, although the number of intervals selected is different than 10. Notice that a number greater than the interval gives a more detailed graph.

  • For the methods with simulation, an additional text box is presented: number of simulations (simulates events – creates the "envelopes").
  • click on Apply. The result is automatically displayed in the screen, as one or more graphs.


See also:
SPRING - Geographical Analysis