Rules of thumb for spatial data -- Brian Klinkenberg

Rules of thumb for spatial data

On this page will be placed some of the rules of thumb that have been identified by practitioners in the fields of Cartography, Remote Sensing, GPS, GIS and Spatial analysis. If you have developed your own rules of thumb, or know of any rules of thumb not described below, I would be delighted to hear from you. Please contact me.

One view "recognnizes the use of rules of thumb as an immediate and semiconscious kind of knowledge that could be called tacit knowledge. Using rules of thumb might explain why practice remains unchanged although educational activities result in more elaborate knowledge" (http://fampract.oupjournals.org/cgi/content/abstract/19/6/617).

Rules of thumb in cartography

# of shades of gray distinguishable (16 max).
# of legend categories consistently recognized (5, 7 max).
Symbol sizes should not vary when mapping qualitative data.
Raw totals should not be mapped using a choropleth map.

Rules of thumb in remote sensing

The # of contiguous pixels required for object identification -- 10 to 16 (source TBD).
McCoy (see below) recommends that, considering all of the potential variables that can affect the results, at a minimum sample units should be no smaller than a 3 x 3 cluster of pixels for training sites or accuracy assessment sites.
What should the size of the sample site be? A useful formula for determining the area of a sample site is: A = P(1 + 2L), where A is the minimum sample site dimension, P is the image pixel dimension, and L is the estimated lcational accuracy in number of pixels. For example, say your GPS has a locational accuracy of 15m, and you are working with TM data (30m pixels). Therefore L is equal to 0.5 (15m GPS - 30m pixel -> our locational accuracy is half a pixel). A is therefore equal to 60m X 60m. This should be considered a theoretical minimum value, since it assumes that the georegistration of the TM image is perfect. Larger sample sites should typically be used in order to allow for both greater GPS uncertainty, image georegistration uncertainties, and heterogeneity on the ground. (Source: R. M. McCoy. 2005. Field methods in remote sensing. New York: The Guilford Press., p. 23)
M. Mather (1987), in his "Computer Processing of Remotely-Sensed Images - An Introduction", states that the minimum number of pixels in a training sample must be 30*p per class, where p = number of features. [Please refer to the third paragraph of section 8.4.1 on Page 290 of the above mentioned book] This may not take into account the effects of spatial autocorrelation, however.
The number of pixels in each training set (i.e., all of the training sites for a single land cover class) should not be less than ten times the number of bands. Thus, when you are using six bands in a classification, you should aim to have no less than 60+ pixels per training set. (Source: Dr. Michael Govorov, Malaspina University College, with attribution to IDRISI).
For a given level of technology, there is a fixed level of 'total resolution' that can be obtained:
image detail is a function of {spatial resolution, spectral resolution, radiometric resolution}. Typically, as one increases the other decreases (thus, we often find that spatial and spectral resolutions are inversely related) (e.g., SPOT Panchromatic with its spatial resolution of 10 m; SPOT Multispectral has a spatial resolution of 20 m).
Ross Nelson, from the NASA Goddard Space Flight Center, has developed a rule of thumb based on careful examination of the accuracy of several studies published in the remote sensing literature. Of course exceptions can be found, but these are very useful guidelines. The underlying concept is that the more precise the class definitions are the lower the accuracy will be for the individual classes. Classification Accuracy:
- Forest/non-forest, water/no water, soil/vegetated: accuracies in the high 90%'s
- Conifer/hardwood: 80-90%
- Genus: 60-70%
- Species: 40-60%
Note: If including a Digital Elevation Model (DEM) in the classification, add 10% (Source: Biodiversity Informatics Facility of the American Museum of Natural History's Center for Biodiversity and Conservation).
Ortho-corrected images can have higher absolute accuracy, but when relative accuracy is needed it may be better to use only systematically corrected images rather than mix systematically-corrected images with ortho-corrected imagery (Source: Biodiversity Informatics Facility).
50% or more of the information for a given pixel contains recorded energy from the surface area surrounding that individual pixel (Source: Biodiversity Informatics Facility).
Clouds will always obscure that part of the imagery that is most important to your study.

Rules of thumb for GPS

The more accurate a location needs to be recorded, the more likely it falls beneath a dense canopy.
The observations should be relatively homogeneous (mixing results from different eras, different field procedures, or from different software, may lead to greater uncertainty).

Rules of thumb in GIS

To determine the approximate resolution of vector data, if the "scale" of the data is known -- take the 1000's units, divide by 2 to get the minimum mapping unit in meters (e.g., 1:20,000 - 20/2 - 10 m).
The resolution of raster data is considered equal to the square root of the cell area.
If topology has not been built, the measurement scale of a GIS data set can be estimated by: A) Measuring the gap between undershoots and the line to which they are supposed to connect, or B) Measuring the length of dangling lines. (Courtesy of Dave Cake, Malaspina University College.)
The accuracy of an overlay product is equal to that of the worst layer, if the layers are "AND'ed" together.
The positional accuracy at one level {point, arc, polygon} may not imply similar accuracy at other levels: that is, positional accuracy about a point says little about the positional accuracy of an arc, etc.
If n1 polygons are overlaid on n2 polyogns, how many polygons result? Assuming that the maps are different: the minimum is n1+n2 (polygons on the two maps do not intersect at all); the maximum is infinity (lines have infinite wiggliness); the typical result is 3 or 4 times (n1+n2), discounting spurious polygons.
The relation between the (average) spacing of spot elevations (mass points) and the maximum resolution of the DEM that should be produced using those points is roughly estimated to be in the order of 1/3 to1/2. Thus, if the ground spacing for the mass points is about 70 m, the DEM should not be created with a resolution of greater than 25 m.
An heuristic is a rule of thumb, strategy, trick, simplification, or any other kind of device which drastically limits the search for solutions in large problem spaces.
A common rule of thumb in the industry is one digitized boundary per minute (e.g. it would take 99/60 = 1.65 hours to digitize the boundaries of the 99 counties of Iowa).
A study area is more likely to occur at the intersection of four map sheets than not.
80% of a municipality's data is spatial (numerous sources now exist on the internet for this figure, often a higher figure is now quoted).
Their are four types of scale (taken from Nina Lam 2004. Fractals and scale in environmental assessment and monitoring. Pages 23-40 in Scale and Geographic Inquiry (eds. E. Sheppard and R. B. McMaster):
- cartographic or map scale: the ratio between the measurements on the map and the actual measurements on the ground (aka the Representative Fraction or RF; e.g., 1:20,000);
- observational or geographic scale: the spatial extent of the study area (aka the 'extent' in landscape ecology);
- measurement scale, or resolution: the smallest distinguishable parts of an object (aka the 'grain' in landscape ecology);
- operational scale: the spatial extent at which certain processes operate in the environment.
The measurement scale (grain) and the observational scale (extent) are inversely correlated, a result of logistical constraints in measurement. Nature itself, of course, has fine grain and large extent. In sampling we sacrifice fine grain for large extent or, reciprocally, narrow the extent of our data when we require fine grain.
A measure of the scope of a data set is the L/S ratio: the ratio of the geographic scale (L: the maximum width of the project area, the square root of area for an oddly-shaped area, the extent) to the measurement scale (S: the resolution or grain). (M.F. Goodchild 2004. Scales of cybergeography. Pages 154-169 in Scale and Geographic Inquiry (eds. E. Sheppard and R. B. McMaster)).
What is the relation between scale and spatial resolution? Peter A. Shary uses the following empirical rule:
pixel size = SQRT ( F * (map area) / (number of points) ), where empirical coefficient F=2.5 , and SQRT() stands for square root. Here (map area) / (number of points) is average map area per each point; I assume that scale is essentially represented in most spatial data by square root of this area. I apply this rule mostly to digitized contour lines to transform them into gridded DEMs. Nevertheless, it seems to be applicable to any spatial data (represented both by irregularly and regularly spaced points), possibly, with slightly diminished F (say, F=2.1) for regularly spaced points.

Rules of Thumb for Mapping Component Customers, By Jeff Cole

Rule #1 - Never bet against Bill Gates!
Rule #2 - Buy into simple and complete products!
Rule #3 - Refuse to pay per-seat runtime royalties!
Rule #4 - Refuse to translate your map files!
Rule #5 - Don't lock yourself into a single GIS component vendor!
Rule #6 - Paying for vendor consulting services or training as a default choice is unacceptable!
Rule #7 - "Always make things as simple as possible, but not simpler." - Albert Einstein

Spatial analysis

Variograms: There are 2 rules of thumb for selecting a lag size:

Have at least 30-50 pairs minimum for any one variogram point. Smaller bins or lag size means less pairs and probably better structure, but too small a bin or lag size typically introduces more noise into the variogram.
Multiply the lag size by the number of lags, which should be about half the largest distance among all points. (Source)

Crime analysis

Sequential patterns that occur reliably can be used to formulate heuristics (rules of thumb): “we should pitch crime C to persons who are involved in A and B.” (From Spatial and collateral data mining for crime detection and analysis)

Project management

Dan Widner, GIS Program Manager, Virginia DOT states: Web enabling of environmental review process is a top priority for VDOT’s environmental and data management organizations. A “rule of thumb” applied at VDOT states that for each month of delay to each $50 million worth of construction projects, $166,000 is added to the cost of the project. VDOT is working with other natural resource agencies in the state to improve the ease of access to data and information using spatial data, tools and technologies. (Source)

Other rules of thumb that are of interest to those working with GIS, remote sensing, etc.

What are some rules of thumb for converting GIF images to JPEG? (For example: A good rule of thumb is not to bother converting any GIF that's much under 100 Kbytes; the potential savings isn't worth the hassle.)

Rules of thumb for metapopulation conservation management.

Rules of thumb about color that you may find helpful.