Spatial Autocorrelation

A potential problem with data obtained for many wildlife studies is that they may have a spatial component. For example, the altitudes in neighbouring sampling units are likely to be similar. This can result in spatial autocorrelation which causes problems for statistical methods that make assumptions about the independence of residuals (a residual is the difference between an observed and a predicted value). Cliff and Ord (1973) define spatial autocorrelation as ‘If the presence of some quantity in a county (sampling unit) makes its presence in neighbouring counties (sampling units) more or less likely, we say that the phenomenon exhibits spatial autocorrelation‘. If there is spatial autocorrelation in data it will lead to a spatial correlation of residuals, for example positive residuals will tend to occur together. If spatial autocorrelation is present it will violate the assumption about the independence of residuals and call into question the validity of hypothesis testing. The main effect of such violations is that the Error SS (Sum of Squares) is underestimated (Davis, 1986 ) thus inflating the value of test statistic. An inflated test statistic increases the chance of a Type I error (Incorrect rejection of a Null Hypothesis). Most GIS provide tools to measure the level of spatial autocorrelation (e.g. Moran's I).

Shaw and Wheeler (1985) give an example of the use of regression analysis to predict rainfall from altitude in the states of the USA. When the residuals were examined they were obviously non-random, and hence must be spatially autocorrelated. They suggest that this should not be viewed as a wholly negative conclusion. Indeed the inevitable regional groupings that are seen in the residuals can suggest ways in which the model may be improved, in their case by a consideration of the influence of the moist Caribbean air streams. Viewed in this way the residuals express variation of the dependent variable that remains after the removal of the effect of the primary predictor(s). While it is true that spatial autocorrelation makes standard significance tests unreliable, researchers are often more interested in the spatial autocorrelation shown by the residuals because these indicate deviations from a trend which presumably identifies discontinuities. Techniques are available that can 'partial out' the effects of space.

Spatial autocorrelation analysis tests whether the observed value of a variable at one locality is independent of the values of the variable at neighbouring localities. If a dependence exists, the variable is said to exhibit spatial autocorrelation. Spatial autocorrelation measures the level of interdependence between the variables, and the nature and strength of that interdependence. It may be classified as either positive or negative. In a positive case all similar values appear together, while a negative spatial autocorrelation has dissimilar values appearing in close association. The distribution of organisms over the earths surface means that most ecological problems have a spatial dimension. Biological variables are spatially autocorrelated for two reasons:

  • inherent forces such as limited dispersal, gene flow or clonal growth tend to make neighbours resemble each other;
  • organisms may be restricted by, or may actively respond to environmental factors such as temperature or habitat type, which themselves are spatially autocorrelated (Sokal & Thomson 1987).

The autocorrelation coefficients for interval and ordinal data are Moran’s statistic I and Geary’s coefficient c.

Moran’s I is based on cross-products to measure value association, and is calculated for N observations on a variable x at locations i, j as:

Moran's I equation

where µ m is the mean of the x variable, wij are the elements of the spatial weights matrix, and S0 is the sum of the elements of the weights matrix:

S0 = åwij. Moran’s I varies from -1 to +1, with an expected value approaching zero for a large sample size in the absence of autocorrelation.

Geary’s c statistic is expressed in the same notation:

Geary's c equation

Geary’s c ranges from 0 (maximal positive autocorrelation) to a positive value for high negative autocorrelation. Its expectation in the absence of autocorrelation is 1 (Sokal & Oden 1978).

An understanding of the spatial correlation structure in an independent data set can be used to set the sampling regime in order to minimize the effect. For example, calculating the distance at which the effect of spatial correlation is minimized in a data set of habitat variables (i.e. the spatial autocorrelation at a series of ‘lags’ from the species to be investigated) can be used for determining the sampling area used for the species under investigation. In practice this can be achieved through a GIS utilising the functionality available, such as the AUTOCORR procedure of the IDRISI GIS package. This calculates the Moran’s I of a raster image (such as landcover data), which, when combined with a pixel thinning procedure, can allow the user to calculate the spatial autocorrelation at a series of distances (lags).


Sokal, R.R. & Oden, N.L. (1978) Spatial autocorrelation in biology. 1. Methodology. Biological Journal of the Linnean Society, 10: 199-228.

Sokal, R.R. & Thomson, J.D. (1987) Applications of spatial autocorrelation in ecology. In: Legendre, P. & Legendre, L. (eds.) Developments in Numerical Ecology, NATO ASI Series, Vol. G14, Springer-Verlag, Berlin.