Rules of thumb for spatial data
On this page will be placed some of the rules of thumb that have
been identified by practitioners in the fields of Cartography,
Remote Sensing, GPS, GIS and
Spatial analysis.
If you have developed your own rules of thumb, or know of any rules of thumb
not described below, I would be delighted to hear from you. Please contact
me.
One view "recognnizes the use of rules of thumb as an immediate
and semiconscious kind of knowledge that could be called tacit knowledge. Using
rules of thumb might explain why practice remains unchanged although educational
activities result in more elaborate knowledge" (http://fampract.oupjournals.org/cgi/content/abstract/19/6/617).
Rules of thumb in cartography
- # of shades of gray distinguishable (16 max).
- # of legend categories consistently recognized (5, 7 max).
- Symbol sizes should not vary when mapping qualitative data.
- Raw totals should not be mapped using a choropleth map.
Rules of thumb in remote sensing
- The # of contiguous pixels required for object identification -- 10 to
16 (source TBD).
- McCoy
(see below) recommends that, considering all of the potential variables
that can affect the results, at a minimum sample units should be no smaller
than a 3 x 3 cluster of pixels for training sites or accuracy assessment
sites.
- What should the size of the sample site be? A useful formula for determining
the area of a sample site is: A = P(1 + 2L), where A is the minimum sample
site dimension, P is the image pixel dimension, and L is the estimated lcational
accuracy in number of pixels. For example, say your GPS has a locational
accuracy of 15m, and you are working with TM data (30m pixels). Therefore
L is equal to 0.5 (15m GPS - 30m pixel -> our locational accuracy is half
a pixel). A is therefore equal to 60m X 60m. This should be considered a
theoretical minimum value, since it assumes that the georegistration of the
TM image is perfect. Larger sample sites should typically be used in order
to allow for both greater GPS uncertainty, image georegistration uncertainties,
and heterogeneity on the ground. (Source: R. M. McCoy. 2005. Field
methods in remote sensing. New York: The Guilford Press., p. 23)
- M. Mather (1987), in his "Computer Processing of Remotely-Sensed
Images - An Introduction", states that the minimum number of pixels
in a training sample must be 30*p per class, where p = number of features.
[Please refer to the third paragraph of section 8.4.1 on Page 290 of the
above mentioned book] This may not take into account the effects of spatial
autocorrelation, however.
- The number of pixels in each training set (i.e., all of the training sites
for a single land cover class) should not be less than ten times the number
of bands. Thus, when you are using six bands in a classification,
you should aim to have no less than 60+ pixels per training set. (Source:
Dr. Michael Govorov, Malaspina University College, with attribution to IDRISI).
- For a given level of technology, there is a fixed level of 'total resolution'
that can be obtained:
image detail is a function of {spatial resolution, spectral resolution,
radiometric resolution}. Typically, as one increases the other decreases
(thus, we often find that spatial and spectral resolutions are inversely
related) (e.g., SPOT Panchromatic with its spatial resolution of 10 m; SPOT
Multispectral has a spatial resolution of 20 m).
- Ross Nelson, from the NASA Goddard Space Flight Center, has developed a rule of thumb based on careful examination of the accuracy of several studies published in the remote sensing literature. Of course exceptions can be found, but these are very useful guidelines. The underlying concept is that the more precise the class definitions are the lower the accuracy will be for the individual classes. Classification Accuracy:
- Forest/non-forest, water/no water, soil/vegetated: accuracies in the high 90%'s
- Conifer/hardwood: 80-90%
- Genus: 60-70%
- Species: 40-60%
Note: If including a Digital Elevation Model (DEM) in the classification, add 10% (Source: Biodiversity Informatics Facility of the American Museum of Natural History's Center for Biodiversity and Conservation).
- Ortho-corrected images can have higher absolute accuracy, but when relative accuracy is needed it may be better to use only systematically corrected images rather than mix systematically-corrected images with ortho-corrected imagery (Source: Biodiversity Informatics Facility).
- 50% or more of the information for a given pixel contains recorded energy from the surface area surrounding that individual pixel (Source: Biodiversity Informatics Facility).
- Clouds will always obscure that part of the imagery that is most important
to your study.
Rules of thumb for GPS
- The more accurate a location needs to be recorded, the more likely it falls
beneath a dense canopy.
- The observations should be relatively homogeneous (mixing results from
different eras, different field procedures, or from different software, may
lead to greater uncertainty).
Rules of thumb in GIS
- To determine the approximate resolution of vector data, if the "scale" of
the data is known -- take the 1000's units, divide by 2 to get the minimum
mapping unit in meters (e.g., 1:20,000 - 20/2 - 10 m).
- The resolution of raster data is considered equal to the square root of
the cell area.
- If topology has not been built, the measurement scale of a GIS data set
can be estimated by: A) Measuring the gap between undershoots and the line
to which they are supposed to connect, or B) Measuring the length of dangling
lines. (Courtesy of
Dave Cake, Malaspina University College.)
- The accuracy of an overlay product is equal to that of the worst layer,
if the layers are "AND'ed" together.
- The positional accuracy at one level {point, arc, polygon} may not imply
similar accuracy at other levels: that is, positional accuracy about a point
says little about the positional accuracy of an arc, etc.
- If n1 polygons are overlaid on n2 polyogns, how many polygons result?
Assuming that the maps are different: the minimum is n1+n2 (polygons on the
two maps do not intersect at all); the maximum is infinity (lines have infinite
wiggliness); the typical result is 3 or 4 times (n1+n2), discounting spurious
polygons.
- The relation between the (average) spacing of spot elevations (mass points)
and the maximum resolution of the DEM that should be produced using those
points is roughly estimated to be in the order of 1/3 to1/2. Thus, if the
ground spacing for the mass points is about 70 m, the DEM should not be created
with a resolution of greater than 25 m.
- An heuristic is a rule of thumb, strategy, trick, simplification, or any
other kind of device which drastically limits the search for solutions in
large problem spaces.
- A common rule of thumb in the industry is one digitized boundary per minute
(e.g. it would take 99/60 = 1.65 hours to digitize the boundaries of the
99 counties of Iowa).
- A study area is more likely to occur at the intersection of four map sheets
than not.
- 80% of a municipality's data is spatial (numerous sources now exist on
the internet for this figure, often a higher figure is now quoted).
- Their are four types of scale (taken from Nina Lam 2004. Fractals
and scale in environmental assessment and monitoring. Pages 23-40
in Scale and Geographic Inquiry (eds. E. Sheppard and R. B.
McMaster):
- cartographic or map scale: the ratio between the measurements
on the map and the actual measurements on the ground (aka the Representative
Fraction or RF; e.g., 1:20,000);
- observational or geographic scale: the spatial extent
of the study area (aka the 'extent' in landscape ecology);
- measurement scale, or resolution: the smallest distinguishable
parts of an object (aka the 'grain' in landscape ecology);
- operational scale: the spatial extent at which certain processes
operate in the environment.
- The measurement scale (grain) and the observational scale (extent) are
inversely correlated, a result of logistical constraints in measurement.
Nature itself, of course, has fine grain and large extent. In sampling we
sacrifice fine grain for large extent or, reciprocally, narrow the extent
of our data when we require fine grain.
- A measure of the scope of a data set is the L/S ratio: the ratio of the
geographic scale (L: the maximum width of the project area, the square root
of area for an oddly-shaped area, the extent) to the measurement scale (S:
the resolution or grain). (M.F. Goodchild 2004. Scales of cybergeography.
Pages 154-169 in Scale and Geographic Inquiry (eds. E. Sheppard
and R. B. McMaster)).
- What is the relation between scale and spatial resolution?
Peter A. Shary uses the following empirical rule:
pixel size = SQRT ( F * (map area) / (number of points) ), where empirical coefficient
F=2.5 , and
SQRT() stands for square root. Here (map area) / (number of points) is average
map area per each point; I assume that scale is essentially represented in
most spatial data by square root of this area. I apply this rule mostly to
digitized contour lines to transform them into gridded DEMs. Nevertheless, it
seems to be applicable to any spatial data (represented both by irregularly and
regularly spaced points), possibly, with slightly diminished F (say, F=2.1) for
regularly spaced points.
Rules of
Thumb for Mapping Component Customers,
By Jeff
Cole
- Rule #1 - Never bet against Bill Gates!
- Rule #2 - Buy into simple and complete products!
- Rule #3 - Refuse to pay per-seat runtime royalties!
- Rule #4 - Refuse to translate your map files!
- Rule #5 - Don't lock yourself into a single GIS component vendor!
- Rule #6 - Paying for vendor consulting services or training as a default
choice is unacceptable!
- Rule #7 - "Always make things as simple as possible, but not simpler." -
Albert Einstein
Spatial analysis
Variograms: There are 2 rules of thumb for selecting a lag size:
- Have at least 30-50 pairs minimum for any one variogram point. Smaller
bins or lag size means less pairs and probably better structure, but too
small a bin or lag size typically introduces more noise into the variogram.
- Multiply the lag size by the number of lags, which should be about half
the largest distance among all points. (Source)
Crime analysis
Sequential patterns that occur reliably can be used to formulate heuristics
(rules of thumb): “we should pitch crime C to persons who are involved
in A and B.” (From Spatial
and collateral data mining for crime detection and analysis)
Project management
Dan Widner, GIS Program Manager, Virginia DOT states:
Web enabling of environmental review process is a top priority for VDOT’s environmental
and data management organizations. A “rule of thumb” applied at VDOT states that
for each month of delay to each $50 million worth of construction projects, $166,000
is added to the cost of the project. VDOT is working with other natural resource
agencies in the state to improve the ease of access to data and information using
spatial data, tools and technologies.
(Source)
Other rules of thumb that are of interest to those working with
GIS, remote sensing, etc.
What
are some rules of thumb for converting GIF images to JPEG? (For example:
A good rule of thumb is not to bother converting any GIF that's much under 100
Kbytes; the potential savings isn't worth the hassle.)
Rules
of thumb for metapopulation conservation management.
Rules
of thumb about color that you may find helpful.
The concept, and these pages, were initially
produced by Brian Klinkenberg© January, 2004.