Spatial Autocorrelation

A potential problem with
data obtained for many wildlife studies is that they may have a spatial
component. For example, the altitudes in neighbouring sampling units are likely
to be similar. This can result in spatial autocorrelation which causes problems
for statistical methods that make assumptions about the independence of
residuals *(a residual is the difference between an observed and a predicted
value)*. Cliff and Ord (1973) define spatial autocorrelation as *‘If the presence of some quantity in a county (sampling unit)
makes its presence in neighbouring counties (sampling units) more or less
likely, we say that the phenomenon exhibits spatial autocorrelation‘.*
If there is spatial autocorrelation in data it will lead to a spatial
correlation of residuals, for example positive residuals will tend to occur
together. If spatial autocorrelation is present it will violate the assumption
about the independence of residuals and call into question the validity of
hypothesis testing. The main effect of such violations is that the Error SS (Sum of Squares) is
underestimated (Davis, 1986 ) thus inflating the value of test statistic. An
inflated test statistic increases the chance of a Type I error (Incorrect
rejection of a Null Hypothesis). Most GIS provide tools to measure the level of
spatial autocorrelation (e.g. Moran's I).

Shaw and Wheeler (1985) give
an example of the use of regression analysis to predict rainfall from altitude
in the states of the USA. When the residuals were examined they were obviously
non-random, and hence must be spatially autocorrelated. They suggest that this
should not be viewed as a wholly negative conclusion. Indeed the inevitable
regional groupings that are seen in the residuals can suggest ways in which the
model may be improved, in their case by a consideration of the influence of the
moist Caribbean air streams. Viewed in this way the residuals express variation
of the dependent variable that remains after the removal of the effect of the
primary predictor(s). While it is true that spatial autocorrelation makes
standard significance tests unreliable, researchers are often more interested in
the spatial autocorrelation shown by the residuals because these indicate
deviations from a trend which presumably identifies discontinuities. Techniques
are available that can 'partial out' the effects of space.

Spatial autocorrelation
analysis tests whether the observed value of a variable at one locality is
independent of the values of the variable at neighbouring localities. If a
dependence exists, the variable is said to exhibit spatial autocorrelation.
Spatial autocorrelation measures the level of interdependence between the
variables, and the nature and strength of that interdependence. It may be
classified as either positive or negative. In a positive case all similar values
appear together, while a negative spatial autocorrelation has dissimilar values
appearing in close association. The distribution of organisms over the earths
surface means that most ecological problems have a spatial dimension. Biological
variables are spatially autocorrelated for two reasons:

- inherent forces such as
limited dispersal, gene flow or clonal growth tend to make neighbours resemble
each other;
- organisms may be
restricted by, or may actively respond to environmental factors such as
temperature or habitat type, which themselves are spatially autocorrelated
(Sokal & Thomson 1987).

The autocorrelation
coefficients for interval and ordinal data are Moran’s statistic *I* and
Geary’s coefficient *c*.

Moran’s *I* is based on
cross-products to measure value association, and is calculated for *N*
observations on a variable *x* at locations *i, j* as:

where µ m is the mean of the
*x* variable, w_{ij} are the elements of the spatial weights
matrix, and S_{0} is the sum of the elements of the weights matrix:

S_{0} =
å*w*_{ij}. Moran’s* I* varies from -1 to +1, with an expected
value approaching zero for a large sample size in the absence of
autocorrelation.

Geary’s *c* statistic
is expressed in the same notation:

Geary’s *c* ranges from
0 (maximal positive autocorrelation) to a positive value for high negative
autocorrelation. Its expectation in the absence of autocorrelation is 1 (Sokal
& Oden 1978).

An understanding of the
spatial correlation structure in an independent data set can be used to set the
sampling regime in order to minimize the effect. For example, calculating the
distance at which the effect of spatial correlation is minimized in a data set
of habitat variables (i.e. the spatial autocorrelation at a series of ‘lags’
from the species to be investigated) can be used for determining the sampling
area used for the species under investigation. In practice this can be achieved
through a GIS utilising the functionality available, such as the AUTOCORR
procedure of the IDRISI GIS package. This calculates the Moran’s I of a raster
image (such as landcover data), which, when combined with a pixel thinning
procedure, can allow the user to calculate the spatial autocorrelation at a
series of distances (lags).

**References**

Sokal, R.R. & Oden, N.L.
(1978) Spatial autocorrelation in biology. 1. Methodology. *Biological Journal
of the Linnean Society*, **10:** 199-228.

Sokal, R.R. & Thomson,
J.D. (1987) Applications of spatial autocorrelation in ecology. In: Legendre, P.
& Legendre, L. (eds.) *Developments in Numerical Ecology*, NATO ASI
Series, Vol. G14, Springer-Verlag, Berlin.