## General Info

**Instructor**: Brian Klinkenberg

**Office**: Geog Room 209

**Office Hours**: Tuesday 12:30-1:30

Wed 12:30-1:30

**T.A.**: Karolina Kukulski

**Class room**: The Leon and Thea Koerner University Centre, Room 103

**Lab room**: Geog 115

Spatial Autocorrelation

A potential problem with
data obtained for many wildlife studies is that they may have a spatial
component. For example, the altitudes in neighbouring sampling units are likely
to be similar. This can result in spatial autocorrelation which causes problems
for statistical methods that make assumptions about the independence of
residuals *(a residual is the difference between an observed and a predicted
value)*. Cliff and Ord (1973) define spatial autocorrelation as *‘If the presence of some quantity in a county (sampling unit)
makes its presence in neighbouring counties (sampling units) more or less
likely, we say that the phenomenon exhibits spatial autocorrelation‘.*
If there is spatial autocorrelation in data it will lead to a spatial
correlation of residuals, for example positive residuals will tend to occur
together. If spatial autocorrelation is present it will violate the assumption
about the independence of residuals and call into question the validity of
hypothesis testing. The main effect of such violations is that the Error SS (Sum of Squares) is
underestimated (Davis, 1986 ) thus inflating the value of test statistic. An
inflated test statistic increases the chance of a Type I error (Incorrect
rejection of a Null Hypothesis). Most GIS provide tools to measure the level of
spatial autocorrelation (e.g. Moran's I).

Shaw and Wheeler (1985) give an example of the use of regression analysis to predict rainfall from altitude in the states of the USA. When the residuals were examined they were obviously non-random, and hence must be spatially autocorrelated. They suggest that this should not be viewed as a wholly negative conclusion. Indeed the inevitable regional groupings that are seen in the residuals can suggest ways in which the model may be improved, in their case by a consideration of the influence of the moist Caribbean air streams. Viewed in this way the residuals express variation of the dependent variable that remains after the removal of the effect of the primary predictor(s). While it is true that spatial autocorrelation makes standard significance tests unreliable, researchers are often more interested in the spatial autocorrelation shown by the residuals because these indicate deviations from a trend which presumably identifies discontinuities. Techniques are available that can 'partial out' the effects of space.

Spatial autocorrelation analysis tests whether the observed value of a variable at one locality is independent of the values of the variable at neighbouring localities. If a dependence exists, the variable is said to exhibit spatial autocorrelation. Spatial autocorrelation measures the level of interdependence between the variables, and the nature and strength of that interdependence. It may be classified as either positive or negative. In a positive case all similar values appear together, while a negative spatial autocorrelation has dissimilar values appearing in close association. The distribution of organisms over the earths surface means that most ecological problems have a spatial dimension. Biological variables are spatially autocorrelated for two reasons:

- inherent forces such as limited dispersal, gene flow or clonal growth tend to make neighbours resemble each other;
- organisms may be restricted by, or may actively respond to environmental factors such as temperature or habitat type, which themselves are spatially autocorrelated (Sokal & Thomson 1987).

The autocorrelation
coefficients for interval and ordinal data are Moran’s statistic *I* and
Geary’s coefficient *c*.

Moran’s *I* is based on
cross-products to measure value association, and is calculated for *N*
observations on a variable *x* at locations *i, j* as:

where µ m is the mean of the
*x* variable, w_{ij} are the elements of the spatial weights
matrix, and S_{0} is the sum of the elements of the weights matrix:

S_{0} =
å*w*_{ij}. Moran’s* I* varies from -1 to +1, with an expected
value approaching zero for a large sample size in the absence of
autocorrelation.

Geary’s *c* statistic
is expressed in the same notation:

Geary’s *c* ranges from
0 (maximal positive autocorrelation) to a positive value for high negative
autocorrelation. Its expectation in the absence of autocorrelation is 1 (Sokal
& Oden 1978).

An understanding of the spatial correlation structure in an independent data set can be used to set the sampling regime in order to minimize the effect. For example, calculating the distance at which the effect of spatial correlation is minimized in a data set of habitat variables (i.e. the spatial autocorrelation at a series of ‘lags’ from the species to be investigated) can be used for determining the sampling area used for the species under investigation. In practice this can be achieved through a GIS utilising the functionality available, such as the AUTOCORR procedure of the IDRISI GIS package. This calculates the Moran’s I of a raster image (such as landcover data), which, when combined with a pixel thinning procedure, can allow the user to calculate the spatial autocorrelation at a series of distances (lags).

**References**

Sokal, R.R. & Oden, N.L.
(1978) Spatial autocorrelation in biology. 1. Methodology. *Biological Journal
of the Linnean Society*, **10:** 199-228.

Sokal, R.R. & Thomson,
J.D. (1987) Applications of spatial autocorrelation in ecology. In: Legendre, P.
& Legendre, L. (eds.) *Developments in Numerical Ecology*, NATO ASI
Series, Vol. G14, Springer-Verlag, Berlin.