# UNIT 4 - THE RASTER GIS

UNIT 4 - THE RASTER GIS

Compiled with assistance from Dana Tomlin, The Ohio State University

### For Information that Supplements the Contents of this Unit:

IDRISI Tutorial (Lorup/Idrisi Project)
Native American Research Information System (NARIS) (AII/U of Oklahoma)
Raster View of the World (Foote and Huebner/Geographer's Craft) -- Both illustrated and described.
Representation and Data Quality (Chrisman/U of Washington)
Scale, Accuracy and Resolution in GIS (B.C. Environment) -- Map and display scale; data accuracy, density, detail, resolution and uncertainty; raster data resolution; GIS analysis; separation of data and annotation; etc.

• A. THE DATA MODEL
• B. CREATING A RASTER
• C. CELL VALUES
• D. MAP LAYERS
• E. EXAMPLE ANALYSIS USING A RASTER GIS
• REFERENCES
• EXAM AND DISCUSSION QUESTIONS
• NOTES
• Although most of the material in this Curriculum is designed to be as independent as possible from specific data models, it is necessary to deal with this basic concept early so that students can start hands-on exercises with a GIS program. Following Unit 5, we return to the more fundamental concepts and do not address specific vector GIS issues until Units 13 and 14. There are other several places these topics could be placed in a course sequence. We have tried to make Units 4 and 5 as independent as possible so that you can move them within the Curriculum relatively easily.

UNIT 4 - THE RASTER GIS

Compiled with assistance from Dana Tomlin, The Ohio State University

• geographical variation in the real world is infinitely complex
• the closer you look, the more detail you see, almost without limit
• it would take an infinitely large database to capture the real world precisely
• data must somehow be reduced to a finite and manageable quantity by a process of generalization or abstraction
• geographical variation must be represented in terms of discrete elements or objects
• the rules used to convert real geographical variation into discrete objects is the data model
• Tsichritzis and Lochovsky (1977) define a data model as "a set of guidelines for the representation of the logical organization of the data in a database... (consisting) of named logical units of data and the relationships between them."1
• current GISs differ according the way in which they organize reality through the data model
• each model tends to fit certain types of data and applications better than others
• the data model chosen for a particular project or application is also influenced by:
• the software available
• the training of the key individuals
• historical precedent
• there are two major choices of data model - raster and vector
• raster model divides the entire study area into a regular grid of cells in specific sequence
• the conventional sequence is row by row from the top left corner
• each cell contains a single value ____________________ 1Tsichritzis, T.C., and F.H. Lochovsky, 1977. Data Base Management Systems, Academic Press, New York.
• is space-filling since every location in the study area corresponds to a cell in the raster
• one set of cells and associated values is a layer
• there may be many layers in a database, e.g. soil type, elevation, land use, land cover
• vector model uses discrete line segments or points to identify locations
• discrete objects (boundaries, streams, cities) are formed by connecting line segments
• vector objects do not necessarily fill space, not all locations in space need to be referenced in the model
• a raster model tells what occurs everywhere - at each place in the area
• a vector model tells where everything occurs - gives a location to every object
• conceptually, the raster models are the simplest of the available data models
• therefore, we begin our examination of GIS data and operations with the raster model and will consider vector models after the fundamental concepts have been introduced.
• consider laying a grid over a geologic map
• create a raster by coding each cell with a value that represents the rock type which appears in the majority of that cells areas
• when finished, every cell will have a coded value
• in most cases the values that are to be assigned to each cell in the raster are written into a file, often coded in ASCII
• this file can be created manually by using a word processor, database or spreadsheet program or it can be created automatically
• then it is normally imported into the GIS so that the program can reformat the data for its specific processing needs
• there are several methods for creating raster databases
• direct entry of each layer cell by cell is simplest
• entry may be done within the GIS or into an ASCII file for importing
• each program will have specific requirements
• the process is normally tedious and time-consuming
• layer can contain millions of cells
• average Landsat image is around 7.4 x 106 pixels, average TM scene is about 34.9 x 106 pixels
• run length encoding can be more efficient
• values often occur in runs across several cells
• this is a form of spatial autocorrelation - tendency for nearby things to be more similar than distant things
• data entered as pairs, first run length, then value
• e.g. the array 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 0 1 1 1 1 would be entered as 3 0 2 1 2 0 3 1 2 0 3 1 1 0 4 1
• this is 16 items to enter, instead of 20
• in this case the saving is 20%, but much higher savings occur in practice
• imagine a database of 10,000,000 cells and a layer which records the county containing each pixel
• suppose there are only two counties in the area covered by the database
• each cell can have one of only two values so the runs will be very long
• only some GISs have the capability to use run length encoded files
• note: Units 35 and 36 cover run length encoding and other aspects of raster storage in more detail
• much raster data is already in digital form, as images, etc.
• however, resampling will likely be needed in order that pixels coincide in each layer
• because remote sensing generates images, it is easier to interface with a raster GIS than any other type
• elevation data is commonly available in digital raster form from agencies such as the US Geological Survey
• the type of values contained in cells in a raster depend upon both the reality being coded and the GIS
• different systems allow different classes of values, including:
• whole numbers (integers)
• real (decimal) values
• alphabetic values
• many systems only allow integers, others which allow different types restrict each separate raster layer to a single kind of value
• if systems allow several types of values, e.g. some layers numeric, some non-numeric, they should warn the user against doing unreasonable operations
• e.g. it is unreasonable to try to multiply the values in a numeric layer with the values in a non- numeric layer
• integer values often act as code numbers, which "point" to names in an associated table or legend
• e.g. the first example might have the following legend identifying the name of each soil class:
• 0 = "no class" 1 = "fine sandy loam" 2 = "coarse sand" 3 = "gravel"

• each pixel or cell is assumed to have only one value
• this is often inaccurate - the boundary of two soil types may run across the middle of a pixel
• in such cases the pixel is given the value of the largest fraction of the cell, or the value of the middle point in the cell
• note, however, a few systems allow a pixel to have multiple values
• the NARIS system developed at the University of Illinois in the 1970s allowed each pixel to have any number of values and associated percentages
• e.g. 30% a, 30% b, 40% c
• the data for an area can be visualized as a set of maps of layers
• a map layer is a set of data describing a single characteristic for each location within a bounded geographic area
• only one item of information is available for each location within a single layer - multiple items of information require multiple layers
• on the other hand, a topographic map can show multiple items of information for each location, within limits
• e.g. elevation (contours), counties (boundaries), roads, railroads, urbanized areas (grey tint)
• these would be 5 layers in a raster GIS
• typical raster databases contain up to a hundred layers
• each layer (matrix, lattice, raster, array) typically contains hundreds or thousands of cells
• important characteristics of a layer are its resolution, orientation and zone(s)
• in general, resolution can be defined as the minimum linear dimension of the smallest unit of geographic space for which data are recorded
• in the raster model the smallest units are generally rectangular (occasionally systems have used hexagons or triangles)
• these smallest units are known as cells, pixels
• note: high resolution refers to rasters with small cell dimensions
• high resolution means lots of detail, lots of cells, large rasters, small cells
• the angle between true north and the direction defined by the columns of the raster
• each zone of a map layer is a set of contiguous locations that exhibit the same value
• these might be:
• ownership parcels
• political units such as counties or nations
• lakes or islands
• individual patches of the same soil or vegetation type
• there is considerable confusion over terms here
• other terms commonly used for this concept are patch, region, polygon
• each of these terms, however, have different meanings to individual users and different definitions in specific GIS packages
• in addition, there is a need for a second term which refers to all individual zones that have the same characteristics
• class is often used for this concept
• note that not all map layers will have zones, cell contents may vary continuously over the region making every cell's value unique
• e.g. satellite sensors record a separate value for reflection from each cell
• major components of a zone are its value and location(s)
• is the item of information stored in a layer for each pixel or cell
• cells in the same zone have the same value
• generally location is identified by an ordered pair of coordinates (row and column numbers) that unambiguously identify the location of each unit of geographic space in the raster (cell, pixel, grid cell)
• usually the true geographic location of one or more of the corners of the raster is also known
• identify areas suitable for logging
• an area is suitable if it satisfies the following criteria:
• is Jackpine (Black Spruce are not valuable)
• is well drained (poorly drained and waterlogged terrain cannot support equipment, logging causes unacceptable environmental damage)
• is not within 500 m of a lake or watercourse (erosion may cause deterioration of water quality)
• recode layer 2 as follows, creating layer 4
• y if value 2 (Jackpine)
• n if other value
• recode layer 3 as follows, creating layer 5
• y if value 2 (good)
• n if other value
• spread the lake on layer 1 by one cell (500 m), creating layer 6
• recode the spread lake on layer 6 as follows, creating layer 7
• n if in spread lake
• y if not
• overlay layers 4 and 5 to obtain layer 8, coding as follows
• y if both 4 and 5 are y
• n otherwise
• overlay layers 7 and 8 to obtain layer 9, coding as follows
• y if both 7 and 8 are y
• n otherwise
• the loggable cells are y on layer 9
• recode
• overlay
• we could have achieved the same result using the operations in other sequences, or by combining recode and overlay operations
• e.g. overlay layers 2 and 3, coding as follows
• y if layer 2 is 2 and layer 3 is 2, n otherwise
• this would replace two recodes and an overlay
• e.g. some systems allow layers to be overlaid 3 or more at a time
• the names given to operations vary from system to system, but most of the operations themselves are common across systems

Star, J.L. and J.E. Estes, 1990. Geographic Information Systems: An Introduction, Prentice Hall, Englewood Cliffs, NJ. An introduction to GIS with a strong raster orientation.

Further references can be found following Unit 5.

1. What types of geographical data fit the raster GIS data model best? What types fit worst?

2. Review the issues involved in selecting a resolution for a raster GIS project.

3. What resolutions would be appropriate for the following problems: (a) determining logging areas in a National Forest, (b) finding suitable locations for backcountry campsites, (c) planning subdivisions to take account of noise from an airport?

4. Review the methods of planning described in Ian McHarg's classic book Design with Nature (1969, Doubleday, New York). In what ways would they (a) benefit and (b) suffer from implementation using raster GIS?

5. Using the documentation for the raster GIS program you have, determine how that program uses (a) the concept of "zone" as a contiguous group of cells of the same value, and (b) the concept of several groups of cells that all have the same value. Is there any ambiguity in the way your program deals with these two concepts?