Original WordPerfect Document || ZIPFIP home page

Introductory Users Manual
6 June 2003. This manual is somewhat dated. In particular, a number of new databases are now packaged with ZIPFIP!
SUMMARY

ZIPFIP is both a set of databases containing census and locational information at the ZIP (ZIP code) and FIPS (county) level, and a program by which these data sets may be accessed and manipulated. It has several useful features, including: editing and displaying data; defining spatial boundaries known as market areas; determining distances between any two sites in the lower 48 States; and aggregating observations. ZIPFIP can extract commonly used census information by county or ZIP code, and correct for missing values.

ACKNOWLEDGMENTS

ZIPFIP is the result of a long process of acquiring data and massaging the user interface. We thank the Resources and Technology Division of the U.S. Department of Agriculture's Economic Research Service (ERS), and the Rocky Mountain Forest and Range Experiment Station of the U.S. Forest Service, for their support.

We also thank a variety of individuals who have provided valuable data and insights, including: Karen Mizer, Noel Gollehon and Ralph Heimlich of ERS, Glenn Brink and George Peterson of the U.S. Forest Service, Robert Mendelsohn of Yale University, and George Muehlbach of the National Center for Resource Innovations.

DISCLAIMER AND INVITATION TO CORRESPOND

The ZIPFIP package was created by Daniel Hellerstein of the Economic Research Service (ERS), Danette Woo (ERS), and Dennis Donnelly and Daniel McCollum of the U.S. Forest Service.

Although we have attempted to find and correct all problems, we assume no responsibility or liability for errors in the program, or misunderstood features. Nevertheless, if problems arise, please let us know -- whether they be bugs, operational difficulties, or incomprehensible documentation. We also stress that the data in this package is not guaranteed to be accurate for every ZIP or FIPS code; ZIPFIP is designed to facilitate bulk processing of data and not as a precise reference tool. If more accurate information on particular counties (or ZIP codes) is required, we recommend obtaining material directly from the Census bureau or from commercial sources.

To install and operate the ZIPFIP package, the user will need a machine that can support a version of DOS. ZIPFIP has been tested under DOS< OS/2, WIN 98, WIN NT, and WIN XP.

If questions arise, either of a technical or practical nature, contact Daniel Hellerstein at:

    Economic Research Service
    Resources and Environment Division
    1800 M. ST. NW, Room s4006
    Washington, DC 20034
    (202) 694-5613

CONTENTS


    

ZIPFIP: A ZIP and FIPS database
    

INTRODUCTION

ZIPFIP is a set of databases containing census and locational information at the ZIP (ZIP code) and FIPS (county) level. ZIPFIP is also a program to access and manipulate these data sets. Features supported by ZIPFIP include: data editing/display, market area creation, determining distances between any two spots in the lower 48 States, and aggregation of observations.

While it does not match the user friendliness of the most sophisticated commercial products, nor the breadth of CD-ROM-based databases, this menu-driven program is easy to follow, even by the casual user, and comes with plenty of on-line help for guidance and reminders.

This manual describes the hows and whys of ZIPFIP. The first chapter, the Command Documentation, is all that the casual user needs to examine. The second chapter contains a tutorial to help users understand the capabilities of ZIPFIP, and the third chapter discusses problems to which a user of ZIP code census data should be alerted.

Data.

A variety of databases are available for ZIPFIP. These include:

ZIPFIP-1980    Contains county (FIPS) and ZIP-code information from 1980. Variables include latitude, longitude, town and county names, and several census measures; including population and per capita income from 1969 to 1988.

ZIPFIP-1990    An update to ZIPFIP-1980, incorporating several new counties and containing several new measures from the 1990 census. ZIPFIP-1990 contains limited ZIP-code information, including latitude, longitude, place name, and number of deliverable addresses.

AGSTATS    A collection of ZIP-code and county-level agricultural statistics from the 1978, 1982, and 1987 Census of Agriculture; including measures of the value of land and capital, and productivity of several major crops.

NRI        A set of variables from the 1982 National Resources Inventory, including soil type and land cover (aggregated at the county level).

FIPSREGN    A set of "identifiers," by county. FIPSREGN can be used to cross-reference counties to other geographical units, such as SMSA and Land Resource Areas.

CLIMSOIL    A set of measures of climate (temperature and precipitation) and soil characteristics (aggregated at the county level).

Notes:        The databases containing information at the ZIP-code level are ZIPFIP-1990, ZIPFIP-1980, and AGSTATS.
            The ZIPFIP package is currently shipped with the ZIPFIP-1990 database. To obtain the other databases, see ordering instructions at the back of this manual.
            For a description of the variables, and sources of information, see the ZIPDATAS.DOC and SOURCES.DOC files (in the \ZIPFIP directory), and ZIPFIP90.DOC (in the \ZIPFIP\DATA directory).

Commands.
ZIPFIP provides a number of commands, divisible into several categories:

    i.    Database editing: EDITSTAT and EDTZFIP.
    ii.    Data extraction: FINDZFIP, FINDNAME, and PRINTSTATS.
    iii.    Market area creation: MARKET and ZIP2FIP.
    iv.    Distance computation: COMPDIST and TRIPDIST.
    v.    Aggregation and assignation: AGGREGATE.

Market areas are defined as a set of zones. Typically, these zones are ZIP codes or FIPS codes. A typical market area consists of one or a number of ZIP (or FIPS) codes within a user-defined distance of a central location.

Aggregation is the act of combining observations. An aggregation may combine observations on individuals into ZIP aggregates, or combine ZIP aggregates into a FIPS aggregate. For example, given a list of visitors to a park and the ZIP-code of each visitor, aggregation can be used to derive the total number of visitors from each ZIP-code in the "market area" surrounding the park. Assignation is the act of assigning an observation to a zone, where a zone may be a ZIP-code, a county, or a geographically defined location.

Data extraction refers to finding and displaying data. ZIPFIP can also apply a number of corrections to the data. For example, missing values occur frequently in ZIP data, due to small samples and due to "nonexistent" ZIP codes. Corrective actions for ZIP data include:

    (a)    the use of FIPS measures for missing values in the ZIP data; and
    (b)    the scaling of the ZIP population variable to account for changes in ZIP code boundaries since the 1980 census.

Database editing refers to changing values in the ZIPFIP database. Distance computation refers to computing great circle or road distances between a set of user-selected points.

Helpful Hints.

.     Case sensitivity: ZIPFIP is case-insensitive.

.     On-line help: To access on-line help, enter ALT-H (hit the ALT key and the H key simultaneously), or the F1 function key.

.     Temporary exit to DOS: Whenever input is requested, the user may request a temporary exit to DOS by typing Ctrl-E (hit the Ctrl and E keys simultaneously). ZIPFIP can be re-invoked by typing EXIT (after which we recommend hitting ALT-R to refresh the ZIPFIP display).

.     ENTER and ESC keys: The word ENTER refers to the ENTER key (the carriage return key). When equated to a definition, such as ENTER=Default, a strike of the ENTER key selects that command (in this case, the default). The ESC key refers to the Esc (Escape) key. ESC will usually cancel the current command, menu item, or input request.

.     Input files: ZIPFIP often asks the user to select an input file. The input file (which must already exist) contains information needed by ZIPFIP; such as a list of ZIP codes for which ZIPFIP will produce further information (please see Appendix D for further discussion of input files).

.     Market area files: ZIPFIP is both a creator and user of market area files. Market area files contain a list of zones, where a zone can be a ZIP code, a FIPS code, or a location by latitude and longitude.

.     Observation files: Observation files are files that contain "raw" data, and are typically not produced by ZIPFIP. ZIPFIP often requests observation files, which it then processes; for example, ZIPFIP can aggregate entries in an observation file into larger zones.

.     Output files: ZIPFIP often creates output files, such as market area files. If one does not otherwise specify a path and file name, the output file will be written to the /OUTPUT subdirectory (typically, /ZIPFIP/OUTPUT).

.     Comment lines: Comment lines and error messages can be

written to the output file, to a "log" file, or to both files. When written to the output file, comments/error messages are always preceded and trailed "comment" characters (the @ character is used by default).

.     Header: When an output file is created, the user may write a multi-line descriptive "header" to it.

.     Power-user tips: ZIPFIP has a number of "power-user" features that facilitate the speedy and efficient user of ZIPFIP. Please see Appendix G, "Power Users Tips," for a description of these features.

ZIPFIP Installation Notes


Installation of ZIPFIP package is automatically done by the INSTALL batch file, which is located on the INSTALLATION diskette. The READ.ME file on that diskette contains detailed instructions. It takes about 5 minutes to complete the installation.

To run the ZIPFIP programs, you need at least an IBM-Compatible computer with an 80286 or better CPU, running DOS version 3.3 (or above). You will also need at least 475K of available memory (memory available for programs after DOS, drivers, and TSR programs have been accounted for). Ideally, 530K of available memory should be free, since this permits one to temporarily exit to DOS. The package takes up approximately 6 Megabytes on a hard disk. If this is excessive, a number of database files and less important programs need not be retained.

To run ZIPFIP (after installation), you should set your default directory to be \ZIPFIP (or whatever directory you selected), and then enter ZIPFIP from the DOS prompt. A menu will then appear, from which you select the desired ZIPFIP command (or invoke help using the F1 function key). For example, if you have installed ZIPFIP on the \ZIPFIP directory on the D: drive, after booting up ...

C:\>D:
D:\>CD ZIPFIP
D:\ZIPFIP>ZIPFIP

will get you going!

Note:    Unless you specify otherwise, all output files are written to the \OUTPUT subdirectory (for example, \ZIPFIP\OUTPUT if you installed to the \ZIPFIP directory).

Notes on CONFIG.SYS

For ZIPFIP to operate correctly, your CONFIG.SYS file should contain the following lines:

     DEVICE = ANSI.SYS or (DEVICE = C:\DOS\ANSI.SYS)
     FILES = 25
     BREAK = ON

If CONFIG.SYS does not contain the DEVICE = ANSI.SYS line,
        ...        a number of extraneous (and very distracting) characters will appear on your screen,
        ...     boldface, color text, and other highlights will probably not work.

        Alternative:    If you can not include DEVICE=ANSI.SYS line in your CONFIG.SYS file, you can use the ANSI program (ANSI.COM, obtained from COMPUSERVE) included on the INSTALLATION diskette before running ZIPFIP (it will perform the same function as including ANSI.SYS in your CONFIG.SYS file).

        Note:    ANSI.SYS is a file that is supplied with DOS, and is often located in the \DOS directory.

If CONFIG.SYS does not contain the FILES=25 line,
        ...        DOS limits the number of open files to three, not nearly enough.

If CONFIG.SYS does not contain the BREAK = ON line,
        ...        the control-C interrupt may not work correctly.

What to do if 6M is not Available:

As currently structured, you will need over 6M to install the full ZIPFIP package. Once installed, nevertheless, several "less important" files can be deleted without substantially affecting the capabilities of ZIPFIP. More or less in order of increasing importance, the following files can be deleted (depending on what database you installed, some of these files may not exist):


1)    Sample output files, in the \OUTPUT directory, can be deleted.
2)    ADDSCALE.EXE and ZFCREATE.EXE (.600K) -- used for modifying ZIPFIP databases. Located in the \DATA directory.
3)    ZIP5NAME.xxx (.1M). A list of ZIP-codes and the name of the post office. FINDNAME will not work with ZIP codes.
    Note: The .xxx refers to a database specific extension, such as ".90."
4)    FIPSNAME.xxx (.120K). FIPS.NAM is a list of FIPS codes and associated names. FINDZFIP and FINDNAME will not work properly with FIPS codes.
5)    ZIP5INDX.xxx (.200K). Will slow down execution, but should have no other effects.
6)    ZIP5STAT.xxx (.500k). PRNTSTAT and EDTSTAT will not be able to extract ZIP census data, but FIPS census data (and ZIP geographic information) will still be available.
7)    USFIPS.PLG (.600k). AGGREGATE will not be able to assign observations to FIPS codes based on county boundaries (the location to polygon option).
8)    VUPOLYS.EXE (.400k). VUPOLYS displays/creates ZIPFIP polygon (.PLG) files.

Running ZIPFIP under OS/2 2.0 and WINDOWS 3.1

ZIPFIP will run under OS/2 2.0 with no modifications, although you may find that the mouse does not work quite right; in which case we can only suggest using cursor keys instead of the mouse. An OS/2 icon for ZIPFIP (ZIPFIP.ICO) is included on the installation disk, should you desire to install ZIPFIP directly onto your desktop.
Although not formally supported, ZIPFIP can be run as a DOS application under WINDOWS 3.1 (you can use the FILE-NEW option to set up a generic icon on the WINDOWS desktop). However, you may have problems with file access, especially when using ALT-F to view directories.


If major problems arise, contact Daniel Hellerstein or Daniel McCollum at the addresses listed above.

ZIPFIP Commands


At the beginning of ZIPFIP the user will be presented with a Main Menu of commands. The menu will look like:

Using 1990 FIPS and ZIP codes and
Base Year = 1990,

Select a ZIPFIP Option (F1 for help):
    EXIT    Return to DOS
    AGGREGATE*     Aggregate observations by FIPS or ZIP
    MARKET*     Create market area of FIPS or ZIP codes
    COMPDIST*     Compute distances between points
    EDITSTAT     Edit ZIP & FIPS census data
    EDTZFIP     Edit ZIP & FIPS location data
    FINDNAME     Search ZIP & FIPS name-databases for a match
    FINDZFIP    Display ZIP & FIPS location & name information
    PRINTSTATS*     Display ZIP & FIPS census information
    TRIPDIST     Compute distances for trip-itineraries
    ZIP2FIP     Find ZIPs inside of a FIPS, FIPS inside of State
    INITIALIZE     Change default database(s), display options, misc.
    LIST     Display (output) file on screen
    DOS     Temporary exit to DOS
     Select option:?
_____________________________________________________________________
*These are the most important commands for the casual user.

After the main menu appears, the user may select a command by either typing in the command name, or using the cursor keys (or mouse) to highlight a command.

AGGREGATE


This program aggregates all "observations" into "zones." Several types of "observations" and "zones" are recognized, including ZIP codes, FIPS codes, and location (a latitude and longitude pair). AGGREGATE supports 10 different types of aggregation:

Type of Observations    Type of Zones

1)    FIPS codes    FIPS codes
2)    FIPS codes    Locations
3)    ZIP codes    FIPS codes
4)    5-digit ZIP codes    3-digit ZIP codes
5)    5-digit ZIP codes    5-digit ZIP codes
6)    5-digit ZIP codes    Locations
7)    Locations    FIPS codes
8)    Locations    5-digit ZIP codes
9)    Locations    Locations
10)Locations    Polygons

The "type of observation" refers to the manner in which an observation's location is identified. For example, each observation in the "observation file" may include a ZIP-code variable. The "type of zone" refers to the level of aggregation (or assignation) desired.

Aggregation Types 7 and 8 are approximate aggregations based on proximity. Similarly, Types 2 and 6 are also approximate, being based on distance from the center of the FIPS (or ZIP) to the latitude/longitude. For exact assignments of "locations", you should use type 10 (location to polygon), provided you have the appropriate polygon file (see notes below).

AGGREGATE provides three methods of aggregation:

1)    ASSIGN    Assign each observation to a zone.
2)    COUNT    Tally observations by zone assignation (the number of observations occurring in each zone is counted up).
3)    SUM     Sum the values of a variable (extracted from each observation) by zone assignation.


AGGREGATE requires two input files:

    (a)    An observations file, in which each entry is an observation containing an identifier. The identifier may be a ZIP code, a FIPS code, or a latitude/longitude pair.

    (b)    A market area file is a list of "zones," where each zone is either a ZIP code, a three-digit ZIP code (the zone consisting of all ZIP codes sharing the first three digits, such as ZIP codes 20900 to 20999), a FIPS code, or a latitude/longitude pair. Alternatively, the market area file can consist of a special "polygon file" (for example, a file of state boundaries).

    Note:    For ASSIGN, a market area file is not required.
        For COUNT and SUM, observations that do not fall within the market area are discarded.

An output file, containing the observation-to-zone assignations, and the requested counts and sums for each zone in the market area file, will be produced. For example, a data file of a sample taken from randomly drawn households contains observations of ZIP codes and purchased quantities of a selected consumer product:

(1)    If you select SUM, ZIPFIP can tell you how much of the selected product was purchased in each ZIP code area.

(2)    If you select COUNT, ZIPFIP can tell you how many households in each ZIP code area purchased the selected product.

(3)    Finally, if you select ASSIGN, ZIPFIP can assign each observation to a FIPS.


Example: input file (FIPS is included for reference):

ZIP    Groups    Days    Visitors    (FIPS)

52544    1    15    10    19007
52549    1    3    2    19007
52556    2    14    5    19101
52558    1    4    6    19101
52591    1    6    6    19107

Assuming that the user chooses to aggregate from ZIP code to FIPS code (Type 3), selecting the fourth variable to SUM will yield the sum of number of visitors, for each FIPS (in the market area). Selecting COUNT, and SUM with variable 2, will yield the total number of observations, and the sum of number of groups, for each FIPS (in the market area), respectively. Note that within a FIPS, the summation occurs across all observations that fall inside of that FIPS. Thus, if variable 4 (Visitors) is chosen, then FIPS 19007 will have a value of 12, FIPS 19101 will have a value of 11, and FIPS 19107 will have a value of 6.

NOTES:

(1)    "Polygons" market areas consist of a set of non-overlapping polygons, such as State and county boundaries. Polygon market area files must be specially created (you cannot use text files). ZIPFIP comes with two polygon files: one for the State boundaries of the lower 48 States (plus the District of Columbia), and one for all U.S. counties. These two polygon files are located in the \DATA subdirectory of ZIPFIP, and are named US48STAT.PLG and USFIPS.PLG respectively.

(2)    Users interested in creating their own "polygon" files should see the VUPOLYS.EXE program (in your \ZIPFIP\DATA directory).

(3)    Use the closest ZIP option to match observations whose ZIP code has no exact match in the ZIP location database. Use the maximum distance option to limit the range within which latitude/longitude matches occur (matches further then this minimum are considered to be out of the marker area).


(4)    Comments about errors and other difficulties encountered are written to the output file, since it is expected that the user will edit the output file. Note that, in most cases, the output file may be readily matched to the market area file, since all zones read from the market area file will have a line in the output file, even if there were no matches to it (where the line will contain zeros).

(5)    In each line of the output file, the zone identifier (for example, a FIPS code) is always written first.

(6)    The output from MARKET and ZIP2FIP can be used as a market area file.

(7)    AGGREGATE may read up to 40 variables, per observation, from the observations and market area files.

(8)    Each observation (identifier, plus latitude and longitude) in the market area file must be on a single line. When latitutde/longitude locations are read from market area files, you can provide the "variable number" containing a numeric identifier of this location. For a discussion of input files, see the appendix, "Input and Output Files."

(9)    Each observation (identifier plus other variables) in the observations file must be on a single line. When latitutde/longitude locations are read from an observation file, you can provide the "variable number" containing a numeric identifier of this location.

(10)    When selecting a market area file, ALT-F will display the directory of the current "input file directory".

MARKET and ZIP2FIP

MARKET will create a market area data file. The idea is to select all zones within a distance (road or great circle) of some user-supplied center -- that is, all zones that pass a user-specified "proximity test." These zones may be either ZIP codes or FIPS codes. Hence, the output of MARKET is a file containing the ZIP or FIPS codes that pass the test of proximity to the center. Optionally, one may direct MARKET to also produce, for each zone selected, a list of distances to a set of user-selected sites.

The user may enter up to 10 sites, for which distances (either road or great circle) to each zone code that passes the proximity test will be computed. Of course, the user-supplied center may be one of these sites: easily specified by selecting the default (hit the ENTER key) when sites are asked for.

For a zone to be selected, two tests must be passed:

1)    The center of the zone (of the ZIP code or the FIPS code) must lie within a band, where the band is specified using a minimum and a maximum distance (setting a minimum distance of zero converts the band into a circle).

2)    The center of the zone must lie within a "quadrant" (an arc) of the band, where the quadrant is specified using two angles, only zones within the arc bounded by these two angles are accepted (note that the arc can be larger then 180 degrees). Selecting angles of 0 and 360 degrees converts the quadrant into the entire band.

The user may ask for road or great circle distance, either in computation of distance to alternate sites or in the proximity measure. Great circle distance is specified by entering only the latitude and longitude when a site is requested -- in other words, do not provide a State identifier.


ZIP2FIP will find (and write to an output file) either a list of all FIPS inside of a State (or a list of States), or a list of all ZIP codes inside of a FIPS (or a list of FIPS codes). An input file containing a list of FIPS codes, or two-letter State abbreviations, is expected. Output consists of either a list of FIPS code, or a list of ZIP codes (and FIPS codes). Note that input MUST be from a file.

NOTES:

(1)    For instructions on entering sites, see the appendix, Entering Sites. For a discussion of distance computation, see Appendix B, "Computing Distance."

(2)    The user may have header lines included at the top of the output file. Since the user is expected to edit the output file, comments about any errors encountered are written to the output file.

COMPDIST

COMPDIST computes distances through two basic functions: computing a multi-stop distance given a list of locations entered by the user; and producing an N x K set of distances, given N starting locations and K end locations.

For the first function, the user enters a list of locations from the keyboard. COMPDIST merely computes distances (road or great circle) between consecutive locations, and adds them up.

For the second function, the user specifies two lists of locations. The first list contains a list of start-locations, the second a list of end-locations. Distances are computed for each start location/end- location pair. If there are less than 10 end-locations, then the distances for each "start-location" will be written on one line in order of entry to the end-location file. Otherwise, each line will contain one distance, with both start and end-location identified.

Locations may take one of three forms:
    (i)    A FIPS code.
    (ii)    A ZIP code.
    (iii)    A latitude, longitude, and State.

For (i) and (ii), the closest match may be requested.

Output is written to the monitor, or to a user-specified file. Each line of the output file will contain an identifier, such as the start-location FIPS or ZIP. For iii, the "line number" in the file is used as an identifier. Alternatively, the user may instruct COMPDIST to use an identifier pulled from the input file.

The input files should have the following format:

    (i) and (ii)    FIPS/ZIP optional_id:
             95616 DAVIS
    (iii)        Latitude Longitude State optional_id:
             38.5 121.7 CA DAVIS

The optional_id may be up to 10 characters long. In the above examples, the optional_id is DAVIS. Note that the State name (CA in the above example) typically follows the longitude. However,

the user can specify the "variable number" of the State name.

Either Road Distance or Great Circle Distance may be requested. However, if the State name is not available the road distance will not be computed (instead, the Great Circle Distance will be computed). See Appendix A, "Entering Sites," for further discussion on entering geographical locations. See Appendix B, "Computing Distances," for a discussion of the methodology used to compute great circle and road distance.

EDTZFIP

EDTZFIP is used to correct errors, or to update information, in either the ZIP FIPS location database, or in the ZIP town name database. See EDITSTAT for editing census databases.

The ambitious user may find ample opportunities to use this command. For example, in the ZIPFIP-1980 database, the ZIP location raw data have a limitation in that all ZIP codes associated with a central post office are given the same town name and the same location (same latitude and longitude). Thus, ZIP code, 02164, in Newton, MA. is given the name and location of BOSTON, since 02164 is a substation of the central Boston, MA post office.

EDTZFIP allows one to edit either the location or (for ZIP) the name database. It expects a record number, it does not expect a ZIP (or FIPS) code (FINDZFIP can be used to find these record numbers). The user supplies this information either from the keyboard, or from an input file. For instructions on the use of input files in EDTZFIP, see the on-line help.

For further details on the use of EDTZFIP, including instructions on the use of input files, see the on-line documentation.

EDITSTAT

EDITSTAT is used to display and change fields in the three census databases:
    (a)    The ZIP code census database.
    (b)    The FIPS code census database.
    (c)    The FIPS code scale (timeseries) database.
    
(To change ZIP code or FIPS code latitude or longitude, or ZIP town name, see EDTZFIP.)

The user first enters a ZIP or FIPS code (or a record number, like in FINDZFIP). EDITSTAT then displays a table of all the variables selected and their values, from which the user selects a variable to change. This selection is easy, just enter any unique portion of the variable name, or use the cursor keys (see Appendix C, "Variable Names").

Example:    If the user has selected FIPS 25017, s/he may change %POVERTY by entering POV at the "variable to change" question. Similarly, s/he may enter UNEMPLOY to select the % UNEMPLOY variable.

NOTES:

(1)    All changes are permanent. Therefore, striking Ctrl-C will not bring back inadvertently changed values.
(2)    When missing values are encountered, an M will be written. Overflow is treated as a missing value.
(3)    For most percent variables, only values between 0 and 100% may be specified. The actual bounds for each variable depends on how the databases was created.

FINDZFIP

Use FINDZFIP to search the ZIP (or FIPS) location and name databases, and report basic information:

(a)    For FIPS, report the longitude and latitude of the requested FIPS, or the name of the county.
(b)    For ZIP, report the longitude and latitude, FIPS, or town name of a requested ZIP.

With either choice, the record number of the ZIP (or FIPS) is also displayed.

After selecting the database, one then provides a ZIP (or FIPS) for which to search. If found, the information in the database is returned. If there is no such ZIP (or FIPS), then FINDZFIP will search for a ZIP (or FIPS) with a reasonably close number, and return the information associated with this "nearby" ZIP (or FIPS).

If necessary, one may select an absolute record number (in the selected database) to display. Do this by entering the negative of the record number the user wants.


FINDNAME

FINDNAME will search either the ZIP or FIPS "name" databases. The ZIP name database is used when looking for a particular town, while the FIPS name database is used when looking for a particular county.

After selecting ZIP or FIPS, the user provides a name. FINDNAME will then search the appropriate database for all ZIP (or FIPS) codes that match this name. This search may be over the entire United States, or may be limited to search a single State. For ZIP codes, the name, ZIP code, FIPS code, and State of every match are displayed. For FIPS codes, the name, FIPS code and State are displayed.
    
There are two display options:
(a)    If any matches are found, the number of matches and the location of the first match are displayed; or
(b)    All ZIP codes (or FIPS codes) that match the name provided by the user are displayed.

There are also two search options:
(a)    Exact matches only; or
(b)    Substring match. In this case, the requested name matches any substring of the ZIP code (or FIPS) name; it need not be an exact match. For example, ORI would match PEORIA, ORINVILLE, and so forth.

Input may be from either the keyboard or an input file. If you use an input file, each line should contain both the ZIP (or FIPS) name and the State name, separated by a comma. For example:

            FRONTIER    ,NE
            SHERIDON    ,NE
            HALL        ,NE
            BEDFORD    ,PA
            YORK        ,PA
            LANCASTER    ,PA

PRINTSTATS

This is probably the most useful command for the general user. It will produce an output file containing census information (such as per-capita income, average temperature, or bushels of corn produced) for each FIPS (or ZIP) code in a user-supplied list of FIPS (or ZIP) codes.

A variety of options are available in PRINSTATS, including:

GO        Generate output (display data)
DATABASE    Examine ZIP or FIPS, or FIPS using ZIP
VARIABLES    Select variables to extract
YEAR        Select current year
SCALE        Select scale factors
MISS OBS    Select method of dealing with missing observations
MISS VALS    Use FIPS value if ZIP value is "missing"
DISTANCE    Compute and display a distance variable
OUTPUT    Select output file
INPUT        Select input file
HEADER    Add header to output file.
EXIT        Exit PRINTSTATS

Each of these commands is explained in greater detail below.

DATABASE: To Select ZIP or FIPS Data

There are three options:

(a)    Produce a list of census variables for selected FIPS codes.
(b)    Produce a list of census variables for selected ZIP codes.
(c)    Converts ZIP codes to FIPS variables, This option will use ZIP codes as input. In other words, display data for the FIPS code that the ZIP code is located in.

Whenever DATABASE is selected, any previously selected variables or scales are dropped. The user must reenter the desired variables and scales. The default DATABASE is FIPS codes.

VARIABLES: To Select Variables to Extract

The user may select any subset of the variables in the database of interest. A menu-like mechanism is used to select variables, with variables selected either by entering their names (or a unique substring), or by using the cursor keys (or the mouse).

SCALE and YEAR: To access time-series information

The YEAR and SCALE option are used to access time-series information. Furthermore, SCALE may be used to modify ZIP code or FIPS data; either to account for missing ZIP codes or to generate census values for years other then the base year (e.g.; 1990) for variables lacking non-base year information.

     SCALE

        The user may selectively modify variables given using "scales". Two kinds of scales exist:

    (i)    USER -- County invariant, with values entered at run time. As a convenience, the Consumer Price Index (CPI) is hardwired into ZIPFIP (with 1980 as a base year).

    (ii)    COUNTY SPECIFIC -- These scales have a separate value for each county, with values stored in the "SCALE" database. Each of these scales consists of two identifiers: A NAME and a YEAR. The NAME identifies the target variables, and the year represents the year. For example, the 82, PERCAP scale is designed to adjust the PERCAP variable, causing 1982 values to be reported. Alternatively, the 82, PERCAP scale can be used as an approximate measure of related variables (say, HOUSEHOLD INCOME) for which explicit time-series data is not available.

        Each variable may have up to four scales applied to it, consisting of any mix of USER or COUNTY SPECIFIC scales; where the COUNTY SPECIFIC variables are selected both by year and by variable.

    YEAR

        The user may instruct PRINTSTATS to display values, if possible, from a user-selected "current year". For example, if the user wants her/his variables to reflect 1982 information (using whatever 1982 information may be available), s/he may select 82 as the current year. Most variables have limited time-series information available, while a few (such as population, per capita income) have many years of time-series information.
        Implementation note: the YEAR option is actually an automated subset of the SCALE option.

Example 1:    (Using the ZIPFIP80 database). To generate a 1984 ZIP code populations that are corrected for missing ZIP codes, select 1984 as the "current YEAR", and assign the scale 80,POPFIX to the ZIP code population variable.

Example 2:    To generate a value for 1984 per capita income expressed in 1980 dollars, select 1984 as the current year, and apply to the per capita income variable the USER scale whose value is the 1984 CPI deflator (such as the CPI84 "hard-wired" USER scale).

Notes:

(1)    SCALE and YEAR must be selected after variables have been chosen (the current YEAR, and all scales, are cancelled when variables are selected).
(2)    When YEAR is selected, all previously selected scales (using SCALE) are cancelled.
(3)    As with variable names, the user may use substrings when selecting a scale name.
(4)    When scaling ZIP code data, or when YEAR is selected with ZIP-code data, the scale values used are drawn from the FIPS that contains the ZIP. In addition, when using YEAR with ZIP code data, the reported values will be based on FIPS level information.

    Example:

    Suppose scale 83,POPULATION for FIPS 25017 is 1.05, and

ZIP 02165 is part of 25017; and that the POPULATION of ZIP code 02165 is desired. If 83,POPULATION is a requested scale (or if 1983 is the requested "current year" ) then the scale used for 02165 will be 1.05.

(5)    For a further description of how scales are used, see the appendix on scaling.

MISS OBS: Options for Missing Observations

ZIP codes are not permanent entities. They are created, removed, and changed frequently (note that FIPS codes are rarely changed). Given limited data resources, this may complicate matters. The MISS OBS option allows the user to choose between several techniques for dealing with the fact that a desired ZIP code may not exist, or may be missing either "location" or "census" information. Four options are provided:

     X    Finds exact match. The ZIP must have entries in both the location and census database. Otherwise, skip this ZIP (or FIPS), and write a "No Match" line to the output file.

     D    If there is no exact match, looks for a ZIP (or FIPS) code with a reasonably close numeric value (that has entries in both census and location databases).

     S    Applies to ZIP data only. Separate matches of census and location databases are performed.

        (i)    A first ZIP code is found in the location database that exactly matches or is numerically close to the desired ZIP code.

        (ii)    A second ZIP code is found in the census database that exactly matches or is numerically close to the desired ZIP code. Note that the resulting two ZIP codes need not be the same.

     B    Applies to ZIP data only. Same as S, but the zip code found in step 2 MUST have a location. In other words, the location and census zip codes can still be different, but for the ZIP code found in step 2, there must be location data available.


For example, in the ZIPFIP-1980 database, there are about 1,000 ZIP codes that have census data, but no location data. These can be accessed ONLY when S is selected.

Notes:

(1)    Option D was designed to be used with the "ZIP code population correction" scale (the scale POPFIX ).
(2)    The default is X.
(3)    For a complete discussion of the problem of missing ZIP codes, see the third chapter of this documentation, "Limits of ZIP codes as Units of Observation."

MISS VAL: Options for Missing Values

For a variety of reasons, such as confidentiality, the value of some variables will be missing. The MISS VAL option offers a partial solution to this problem when ZIP data is desired. (Note: ZIP data is much more likely to contain missing values than FIPS data). Since ZIP codes are disaggregated FIPS codes, it is often logical to use an appropriate value from the FIPS when a ZIP value is missing. For a large class of variables (for example, percents and per capita averages), this substitution of FIPS values for missing ZIP values is defensible as a second best solution. This, however, is not always the case; one would not want to use raw counts (such as population).

PRINTSTATS supports this feature through the use of variable names. Specifically, if a ZIP code has a missing value, and one has requested that a FIPS value be used in its place, then the variable from the FIPS database with EXACTLY the same name will be used (obviously, from the FIPS that the ZIP is inside of). If a ZIP variable does not have an exactly similar counterpart in the FIPS database, then this missing value replacement will not be available. For a further discussion of variable names, see Appendix C, "Variable Names."

The default is NO missing value replacement. When a value is missing, the PRINTSTATS will display a period or an asterisk (an "." or "*"). See the third chapter of this documentation: "The Limitations of ZIP Code Data as Units of Observation," for further discussion, with special attention to the problem of "missing observations."


INPUT

    The list of ZIP (or FIPS) codes to be processed may come either from the keyboard or from an input file (the default input is the keyboard). For example, you can use the output of MARKET, or ZIP2FIP, as input files. For a discussion of the use of input files, see Appendix D, "Input and Output Files."

    In addition to containing a list of observations (for example, a list of FIPS codes that comprise a market area), the input file can also contain special "statistical" commands. These commands tell ZIPFIP to generate and output some simple statistics on the variables you selected. The commands can be placed anywhere in your input file, the statistics will be computed for all observations read up to the location of the command.
    
    PrintStats statistical commands are:
$SUM        Compute sum of each variable.
$VAR        Compute variance of each variable.
$MEAN    Compute mean of each variable.
$MAX        Compute maximum of each variable.
$MIN        Compute minimum of each variable.
$RESET    Reset all statistics. This is useful if you want to generate separate statistics for several subsets of observations (say, for each of several market areas listed in one input file).

    Note that these commands should appear on separate lines.

    Two other options are relevant for PrintStat input files.

    i)    Output comments. If you desire, all comment lines encountered in the input file can be written to the output file.
    ii)    Suppress output of individual observations. If you desire, data on individual observations can be suppressed. This is useful if you only want to see the statistics, and do not want to clutter up your output file with extra data.

OUTPUT

Output may be directed either to the user's display screen or to an output file. Selecting O allows the user to name an output file to which to write results. Depending on the value of option Z, the output file will be structured as:

    (a) FIPS            FIP_STATUS    Census_Variables.
    (b) ZIP                ZIP_STATUS    Census_Variables.
    (c) FIPS    ZIP    ZIP_STATUS    Census_Variables.

Alternatively, one can write results in a binary format. This "machine readable" output can then be used by other programs (such as the GAUSS statistical package). If binary output is selected, only matched ZIP (or FIPS) codes are written to the output file. In contrast, for ASCII output (described above), when no match is found an appropriate comment line is written to the output file.

Notes:

(1)    If DISTANCE is selected, its value will always directly follow the _STATUS variable. Census variables are written in the order they appear in the variable selection menu. For a discussion of the _STATUS variable, see the third chapter of this documentation.
(2)    To aid in future recall, PRINTSTATS allows the user to add a "header" to the top of the output file. This header may contain a list of the currently selected variables.
(3)    The default output is to the display screen (the user's monitor).

Distance

As a feature, the PRINTSTATS may be used to compute a distance from a user-selected site to the center of each ZIP code (or FIPS) in the input list. For ZIP codes, two options are available:
    (a) Use the center of the ZIP; or
    (b) Use the center of the FIPS (that contains the ZIP).

See Appendix A, "Entering Sites," for instructions on entering sites. See Appendix B, "Computing Distances," for a description of how distances are computed.

TRIPDIST

TRIPDIST computes the minimum mileage (measured in road or great circle miles) needed to complete a multiple site trip. The user supplies a file containing a list of "origins," where an origin can be a ZIP code or a FIPS code. In addition, the user supplies a list of itineraries (an itinerary is simply a list of sites). TRIPDIST then computes the minimum mileage needed to complete each itinerary, from each origin. The order in which one enters the stops for a given itinerary is unimportant, since TRIPDIST computes the minimum distance required to visit each site and (optionally) to return home (to the origin).

Each line in the output file consists of the origin identifier and the minimum trip distance, for each itinerary. A maximum of five sites may be entered for each itinerary. Up to 30 itineraries may be entered. TRIPDIST finds the shortest route that connects each site on the itinerary. In contrast, COMPDIST will compute the length of a line that connects each site in the order they appear in the itinerary. In other words, COMPDIST makes no attempt to "optimize" the route, while TRIPDIST will search all possible routes (that connect each site in the itinerary), and report the route with the shortest mileage. Thus, the distance computed by TRIPDIST is always less then or equal to the distance computed by COMPDIST.

Notes:

(1)    The more sites on an itinerary, the slower TRIPDIST will run.

(2)    A header may be written to the output file.

(3)    Exact matches only: If a ZIP can not be found, the user will receive a response, BAD ZIP/FIPS.

(4)    For a discussion of how to enter sites, see Appendix A, "Entering Sites." For a discussion of how distances are computed, see Appendix B, "Computing Distances."

INITIALIZE

INITIALIZE is used to initialize and review several ZIPFIP options. The features that can be set by INITIALIZE include:

(a)    INPUT and OUTPUT: Changes the default output, and input directories to the user's personal work area.

(b)    DATABASE: Selects any of the currently installed databases. ZIPFIP comes packaged with the ZIPFIP-1990 database; several others are available (see ordering information at the back of this document).

(c)    REVIEW: Lists the files comprising the currently selected database.

(d)    COLORS: Alters the colors used in EDITSTAT, and for displaying the results window.

(e)    ADVANCED: Displays a list of shortcuts for the frequent user (Appendix G, "Power-User Tips," contains further discussion).

(f)    MERGE: Facilitates the merging of two output files. The merge checks for same identifier (such as a ZIP code), and will fail if a mismatch occurs. Thus, MERGE should only be used with ZIPFIP output that derived from a single market area file.

(g)    LOG: Selects the location to which comments and error messages are written. These descriptive messages can be written either to your output file (in which case, they are bracketed by the @ characters), to a "log" file (with a default name of ZIPFIP.LOG), or to both.

(h)    COMMENT: Select the "following" and "trailing" comment character(s). By default, these are set to the @ character.

Notes:
(1)    INITIALIZE uses "initialization files" (such as ZIPFIP90.INI), that contain information on the default input and output directories. Careful users can modify these files directly.
(2)    For information on creating "customized" databases, contact the authors (addresses at the front of this documentation).

LIST

LIST is used to list output files on the screen. It is included as a convenience, giving the ZIPFIP user an easy means of examining output files.

Upon selecting a file to display, you can move up and down in this file by using the PgUp, PgDn, Up arrow, Down arrow, Home (go to start of file) and End (go to end of file) keys. You can also scroll the screen sideways by using the left and right arrow keys (up to 250 characters will be read). You can enter a line number, the block of lines containing the requested line will be displayed, with the requested line in reverse video.

You can also search for a text string. Hit the F2 function key, then enter the case-specific text string you want to search for. If found, the line on which this string occurs will be displayed with reverse video.

If the file name you request cannot be found in the current directory, ZIPFIP will then search the default ZIPFIP output directory. This is useful for viewing ZIPFIP output.

LIST can also be used to examine the current (temporary) contents of both the "results buffer" and the "input history" buffer. You can even move the current contents of these buffers to an output file. Note that both these buffers are 20 lines long.

Appendix A: Entering Sites

In several commands, the user enters one (or several) "sites," where a site is defined as a latitude, longitude, and a State name. The user should enter three values (two numbers and one name) on one line -- with each value separated by either commas or spaces. For example:
    Lat, Long, State: 39.2 81.6 WV
The State name (WV, in this example) may be either a two-letter abbreviation or the complete name.

For most commands, a default location may be specified by entering the appropriate two-letter sitename. The list of default sites is contained in the file INISITES.DEF. If the user wishes to customize the list of defaults, s/he may edit INISITES.DEF (in the \DATA subdirectory) with her/his favorite text editor.

Notes:

(1)    As a convenience, you can select the latitude, longitude and State code of a particular ZIP code or FIPS code as your site. To do this, simply enter:

        ZIP=xxxxx or FIPS=xxxxx, where xxxxx is the ZIP or FIPS code (the ZIP=xxxxx option is available only if the selected database contains ZIP code information.

    Examples:
    ZIP=48105    This code (for Ann Arbor, MI) yields a latitude of 42.3, a longitude of 83.8 and State code of 26.
    FIPS=23005    This code for (Cumberland, ME) yields a latitude of 43.9, a longitude of 83.7, and State code of 23.

(2)    If computation of road distance is desired, but you do not provide a State name (for example Lat, Long, State: 39.2 81.6), a great circle distance will be computed.

(3)    For a discussion of the methodology used to compute distances, see Appendix B, "Computing Distances."

(4)    Since ZIPFIP is designed for use in the United States, the absolute value of latitudes and longitudes are used.

Appendix B: Computing Distances

Two kinds of distances are produced by the ZIPFIP package: the Great Circle Distance and the Road Distance.

    Great Circle Distance: The Great Circle Distance between a pair of points is simply the direct distance between the two points, measured over the surface of a curved earth. It is an "as the crow files" distance. All that is required is the longitude and latitude of the two points.

    Road Distance: The Road Distance developed in the ZIPFIP package is technically a Great Circle Distance that has been corrected for route circuity. The idea is that since distant points are typically not connected by straight line roads, the traveler must follow a more or less circuitous route to get from point A to point B. The circuity factor is, therefore, a correction factor that is applied to the Great Circle Distance in order to approximate the road distance.

The circuity factor will differ for every conceivable pair of locations. Lacking such complete information, ZIPFIP uses State-to-State averages. Thus, a single circuity factor is used for all trips from any location in State A to any location in State B. Finally, for each State there is a "intra-State" circuity factor.

Assuming that one has the longitude, latitude and State for a pair of locations, the computation of road distance is a three-step process:

(a)    Compute the Great Circle Distance between the two locations.
(b)     Look up the State-to-State circuity factor for this pair.
(c)    Multiply the Great Circle Distance by the circuity factor.

Although somewhat naive, the results of this procedure are surprisingly accurate. Comparison of distances computed using ZIPFIP to those published in road atlases typically differ only by a few percentage points, especially for longer trips. Problems do arise in shorter trips, especially in mountainous regions, where specific journeys might necessitate a very indirect route.

Note: The average circuity factor is 1.15.

Appendix C: Variable Names

Several commands (EDITSTAT and PRINTSTATS) ask the user to enter a variable name. When entered using text string, ZIPFIP will recognize any unique portion (substring) of a variable name. Thus, a variable with the name "% WHITE" may be requested by "% WHITE" (the exact name), "% W", and so forth. Note that "%W" will not work since the space after the "%" has been omitted.

ZIPFIP will first attempt to exactly match the string you entered to one of the variable names. If no exact match is found, then a "substring match" is attempted. In some cases, two variables will have nearly the same name (for example, "PERCAP2" and "PERCAPX"). The string, PERCAP, matches either name. In this situation, the first PERCAP encountered (PERCAP2) will be the one selected. To be sure of selecting PERCAPX, enter PERCAPX (the exact name).

Another important use of variable names is to match ZIP and FIPS variables for purposes of missing values replacement. The inexperienced user need not worry about this, assuming that the package has been correctly installed. The following discussion is meant for the curious or adventurous.

The trick used for missing value replacement is to search the FIPS database for a variable with the exact same name as the variable afflicted with a missing value. If such a variable exists, then the value of this variable is used instead of the missing value, such value being drawn from the FIPS to which the desired ZIP belongs.

Example:     FIPS Name     ZIP Name     Result

    %MALE    %MALE    Match (FIPS value may be used)
    LAND AREA    SQ MI    No Match
    POPUL    POPUL    Inappropriate Match

The third example ( POPUL) is important. Replacing the value of %MALE from the FIPS, when ZIP is missing, is probably justifiable. In contrast, replacing POPULATION (from the FIPS) for a POPULATION missing from a ZIP is always incorrect. To avoid this, ZIPFIP will prompt you before substituting for POPULATION variables.

Appendix D: Input and Output Files

ZIPFIP often asks the user to supply data; such as a list of ZIP (or FIPS) codes. Sometimes this may be done from the keyboard, sometimes through ascii data files, and sometimes the user is given an explicit choice.

When using input files, each "observation" must be on one, and only one line of the ascii data file. Variables should be separated by spaces or commas. Values may be in integer or real format (with or without decimal point). Often, the user will be asked to provide the "nth variable" (referring to the location in the line of data) containing the ZIP code, FIPS code, or other identifying information. A datafile containing a list of FIPS codes is expected, output consists of either a list of FIPS code, or a list of ZIP code (and FIPS code).

Example:    46992    1     3 5    11.50    
    46996    1.2     2 6    20.00    

In the order they are entered, these numbers represent the ZIP code plus four items of data, such as numbers of groups, trips, days, or dollars. Thus, for the second line of the example:
    
    nth=1 --    indicates the value of the first item (for example, the ZIP code), 46996;
    nth=2 --     indicates the value of the second item, 1.2;
    nth=3 --    indicates the value of the third item, 2; and so forth.

A convenient method of entering a variable number is to use the ALT-V option. When asked to enter a variable number, strike ALT-V. The first several lines of the file will be displayed. You can then use the up and down arrow keys to select the desired "variable location." You can also use the PgUp, PgDn, CTL-left-arrow, and CTL-right-arrow keys to move within the file (say, to examine a block of text starting at the 50th row and 100th column). In addition to the ALT-V key, you can also use the ALT-L key to list the file (ALT-L is better for examining larger files, but is not optimized for variable selection).

Output files are created by most ZIPFIP actions. When you

provide a name for an output file, if you do not specify a directory, it will be created in the default output subdirectory; typically /ZIPFIP/OUTPUT. A default output file is usually available (just hit the ENTER key). If a file with the requested name exists, you are given the option of overwriting the prior file, appending to it, or trying a different name.

Notes:

(1)    When asked to provide an input or output file but you want to enter the data form the keyboard, enter CON (short for Console).
    Careful:
for input files, you might have to hit Ctrl-Z to signal end of input; for output files, you might get an unreadable stream of data.

(2)    The "@" or "*" character, when placed at the beginning of a line in an input file, acts as a "comment" label. In other words, lines in an input file whose first character is an "@" or a "*" are ignored by ZIPFIP.

Appendix E: Missing Database Files

If the package has been set up correctly, and the user is running ZIPFIP from the correct directory (such as \ZIPFIP), this appendix should not be needed. However, if the user wants to save disk space, or has plans for personalized optimization, then s/he should read on.

Each ZIPFIP database is constructed from a variety of files. Depending on one's uses of ZIPFIP, a number of these files can be removed. For example, in the ZIPFIP-1990 database, the consequences of removing the following files are:

File                Consequences of Removal

CIRCUITY.UNF    Great Circle Distances may be computed, but exact road distance circuity factors will not be available: a value of 1.15 will be used.
ZIP5INDX.90     Searches of the ZIP location and name databases will be slow.
ZIP3INDX.90     ZIPFIP will (slowly) create an index at start up.
ZIP5NAME.90    FINDNAME will not be able to find town names.
FIPTOZIP.90    MARKET will not work with ZIP codes. FINDNAME may not work properly.
FIPSLOC.90    Most of the FIPS code oriented commands will not work.
FIPSTAT.90     PRINTSTATS and EDITSTAT will not edit FIPS data.
FIPSNAME.90    FINDNAME will be unable to find county names.
ZIP5LOC.90     Most of the ZIP code oriented commands will not work.
ZIP5STAT.90    PRINTSTATS and EDITSTAT cannot access ZIP census data.
FIPSCALE.90     PRINTSTATS will be unable to "scale" variables.

Several of these file are large, especially ZIP5NAME.90 and ZIP5STAT.90. Please note the consequences listed above if the user is forced to remove these large files their machine.

Appendix F: Scaling

Scaling has two purposes: (1) to produce values for years other than the base year; and (2) to correct for missing ZIP codes. In ZIPFIP, two kinds of scales exist: User scales and County-specific scales.

County-Specific Scales

A scale is a fraction that is multiplied by some "absolute value" (perhaps a percentage) in the census database in order to produce some new value. The form of most county-specific scales is:

    SCALE=NEW_VALUE / OLD_VALUE.

OldValue is usually a value corresponding to the base year, while NewValue is the value for some nonbase year. For example, if the most recent census was taken in 1990, then OldValue would be taken from the 1990 "census" database, and NewValue could be a value from 1991. The net effect is that when SCALE is multiplied by the appropriate OldValue, the NewValue will be produced. One may ask why ZIPFIP does not just save the NewValue in the FIPS census database; the answer lies in the conservation of disk space, the ease of adding new scales and, most importantly, the scaling of ZIP code variables (for a discussion of scaling ZIP code variables, see the discussion of Scaling in the PRINTSTATS command documentation).

Correcting for Missing ZIP Codes

A special County-Specific scale, known as POPFIX, exists for precisely one purpose: to correct for missing ZIP codes. It guarantees that the sum of "scaled" populations from all ZIP codes in a FIPS will equal the population in the FIPS. In other words, POPFIX is meant to be applied only to ZIP data.

A complete description of Missing ZIP codes, and the use of scaling as a correction for missing ZIP codes, may be found in the third chapter of this documentation: Problems With ZIP Code Data.

User Scales

User scales are county-invariant scales, with values entered at run time. For each User scale selected, one must specify a single value. This single value will be applied to the selected variable, for all observations, regardless of FIPS code. Though not as flexible as the COUNTY-SPECIFIC scales, this feature does allow the user to enter some national corrections.

In particular, one may enter national deflators for the income variables. As a convenience to the user, a Consumer Price Index (CPI) deflator is hardwired into PRINTSTATS. It is easy to extract a value of the CPI to use as a USER scale; see PRINTSTATS on-line help for details.

Up to 25 User scales may be active at one time.

Appendix G: Power-User Tips

ZIPFIP contains a number of special features that can substantially ease its use. Many of these features involve special keystrokes that move within a command string, and move strings to and from different "buffers."

There are two buffers: the results buffer and the input-history buffer. The results buffer contains a temporary copy of recent "results," that is, recent output of ZIPFIP. Since this buffer is only 20 lines long, ZIPFIP uses some judgment as to what to write to the output buffer. Basically, long lists (such as market areas) are summarized, while single answers (such as locations of ZIP codes) are written completely to the buffer. The input history buffer contains a "history" of the last 20 command strings entered into ZIPFIP. Note that only command strings are retained; options selected from a menu using cursor keys (or the mouse) are not retained.

Another general feature of ZIPFIP is the ability to simultaneously select a menu option and specify an argument to this option. For example, input files are often listed as items on a menu. The standard procedures of selection are to highlight the selection (using the cursor keys or the mouse), hit the ENTER key, and then supply the name of the file. This sequence may be shortened: after highlighting the selection, enter the file name before hitting the ENTER key. This short cut is available for most, but not all, menu items.

The following lists important keystrokes that are used to manipulate buffers, and speed up menu entries:

Key Stroke        Description

CTRL-E    Temporary exit to DOS.
ESC    Erase current line. However, if there are NO characters entered, ESC is interpreted as a "cancel."
INS        Toggle insert mode. A flashing block signals insert mode on.
. and .    Move left and right in the string.
HOME    Move to beginning of string, or to first option in a menu.

END    Move to end of string, or to last option in a menu.
F3 and F4    Move up and down in the input history buffer.
F1        Context Sensitive Help
ALT-D    Scroll "results window" DOWN one line.
ALT-U    Scroll "results window" UP one line.
ALT-G    Get top line of results window.
ALT-P    Put current text into results window.
ALT-R    Refresh screen. Useful after a Ctrl-E exit to DOS.
ALT-ENTER    Accept string as input to current menu option. This overrides "sub-string match"; the currently highlighted option is chosen with the string used as input.
ALT-F    When ZIPFIP asks for an input file, hitting ALT-F will cause ZIPFIP to list the files in the default input directory. You can then select a file from this list, or change directories, using cursor keys.
ALT-V    When ZIPFIP asks for the "nth variable" (in a line of an input file), hit ALT-V to display the currently selected input file.
ALT-L    When ZIPFIP asks for the "nth variable" (in a line of an input file), hitting ALT-L allows the user to list a file (similar to the LIST command). In addition, LIST will attempt to "parse" each line of the input file, highlighting each separate variable.

Note that the input-history, and command line editing, features of ZIPFIP are similar to the DOSKEY program of DOS 5.0.

ZIPFIP Tutorial

This quiz uses the BWOBS.SMP file that is stored in your \OUTPUT subdirectory. BWOBS.SMP consists of observations of permits issued to visitor parties at the Boundary Waters Canoe Area. Each observation contains information on the number of visitors per party and the ZIP codes of each party.

Quiz

You have several tasks to accomplish:

(1)    Find the location of four U.S. cities:

        Chicago, IL
        Ely, MN
        Fargo, ND
        Minneapolis, MN

TIP:    Write the latitude, longitude, ZIP and FIPS codes on paper for future reference.

(2)    Compute the distance from Ely, MN to Chicago, IL, from Ely, MN to Fargo, ND, and from Ely, MN to Minneapolis, MN.

TIP:    Since you need to compute only two distances for this exercise, do this by entering the input information from the keyboard. You may, however, specify files of starting and ending locations if you have a lot of information to handle.

(3)    Compile a list of counties within 500 miles of Ely, MN.

TIP:    Store output in an Output file, for example ELYMKT.FIP.

(4)    For each county on this list, create a data file of the following from the 1990 U.S. Census: population, per capita income, percentage of population with a college degree.

TIP:    Use the file you created in (3) as an Input file.
    Store output in an ASCII file, for example ELYMKT.CEN.


(5)    For each county on the list, compute the minimum round-trip

mileage for a trip on which you visit each of several cities. Do this for the following sets of cities:

    a.    Chicago, Ely, and Fargo.
    b.    Ely and Minneapolis.
    c.    Chicago, Ely, and Minneapolis.
    d.    Chicago, Ely, Fargo, and Minneapolis.

TIP:    Use the ASCII file created in (4) as a Market file.
    Store output in a file, for example ELYITIN.DIS.

(6)    For each county on the list, compute the number of BWCA permits issued to visiting parties, and compute the number of individuals who visited the BWCA.

TIP:    Use the file you created in (3), and BWOBS.SMP, as input files.

Possible answers to Quiz

Exercise 1:    To find the location of four U.S. cities:

            Chicago, IL
            Ely, MN
            Fargo, ND
            MinneapolIs, MN

a.    Get into ZIPFIP from the DOS prompt. You must first change directories until you are in the ZIPFIP directory.
b.    From the ZIPFIP Main Menu, select FINDNAME.
c.    Select " NAME <enter>"; type CHICAGO <enter>.
d.    Select " STATE <enter>"; type IL <enter>.
e.    Select " ALL <enter>"; type NO <enter>.
f.    Select " RUN <enter>".
    NOTE: ZIPFIP will give you the location for Chicago, IL. Your monitor should read:
    #matches= 65, (first zip,fip @ 60600 17031

    We suggest you record the (single) ZIP code returned by ZIPFIP on a piece of paper for future reference.

g.    Exit to the ZIPFIP Main Menu and select FINDZFIP.; Then select ZIP LOC, and enter this (single) zip code for Chicago. You should receive latitude, longitude and FIPS code for Chicago:

        ZIP, FIP = 60600 17031; LAT,LONG = 41.850 87.650

h.    Now repeat steps c through g for Ely, Fargo, and Minneapolis.

    You should have recorded the following results:

    Chicago: ZIP,FIP=60600 17031; Lat,Long = 41.850 87.650
    Ely: ZIP,FIP = 55731 27137; Lat,Long = 47.903 91.867
    Fargo: ZIP,FIP=58102 38017;Lat,Long=46.877 96.789
    Minneapolis: ZIP,FIP=55400 27053;Lat,Long=44.98 93.26

Exercise 2:    To compute the distance from Ely, MN to Chicago, IL, and from Ely to Fargo, ND.

a.    From the ZIPFIP Main Menu, select COMPDIST.
b.    Select START-LOC, then hit <enter> to select "default = Input from keyboard."
    Then, select ZIP as the "type."
    Note: You need to compute only two distances for this exercise, so you may do this by entering the input information from the keyboard. You may, however, specify files of starting and ending locations if you have a lot of information to handle.
d.    Select OUTPUT to specify a name for your output file, such as DISTANCE.OUT. This file will store all the information you generate within COMPDIST.
e.    Select RUN. COMPDIST will ask for a list of ZIP codes (the distance between these zip codes will be computed).
    First, enter the ZIP code for Ely, MN, and then the ZIP-code for Chicago, IL. Hit <esc> to indicate the end of this itinerary.
f.    Without exiting from COMPDIST, repeat step e for Ely to Fargo and Ely to Minneapolis.

To view the results, which have been stored in the output file DISTANCE.OUT, go back to the ZIPFIP Main Menu and select LIST. When specifying DISTANCE.OUT, this file should read:

         2 572.66

         2 269.39

         2 242.05

    

Exercise 3:    To compile a list of counties within 500 miles of Ely, MN.

    a.    From the ZIPFIP Main Menu, select MARKET.
    b.    From the MARKET OPTIONS menu, select CENTER(S); then select "default = Input from keyboard," and enter the latitude, longitude and State (MN) for Ely.
    c.    Select RANGE; type 0.0 <enter> for the minimum range; type 500 <enter> for the maximum range.
    d.    Select OUTPUT; type ELYMKT.FIP <enter>.
    e.    To process these commands, select RUN.

    The first few lines of ELYMKT.FIP should read:

     @ Center of FIPS Market area: 47.903 91.867 27 @
     17085
     17177
     17201
     19005
     19011

Exercise 4:    To create a data file, for each county on this list, of the following from census variables: 1985 population, 1985 per capita income, and 1990 percentage of population with a college degree.

    a.    From the ZIPFIP Main Menu, select PRINTSTATS.
    b.    From PRINTSTATS, select INPUT; type ELYMKT.FIP <enter>, and for location, 1 <enter>.
    c.    Select OUTPUT, then ASCII for ascii output; then type ELYMKT.CEN <enter>.
    d.    Select VARIABLES to choose from available FIPS-level variables. From this list, highlight POPULATION, PERCAP, and %COL DEGRE.
    f.    Select SCALE; then select YEAR. Type 85 <enter> to select 1985 as your base year.

    g.Select HEADER, <enter>, and YES.
    g.    Select GO to extract statistics.

    The first few lines of ELYMKT.CEN should read:

@ Base_year = 1990 , and Current Year = 85 @
@ For Year: 85, data available for: @
@ POPULATION @
@ PERCAP @
@ For Year: 1990, data available for: @
@ % COL DGRE @
@ Order of variables is ... @
@ Generated variables @
@ FIPS (found), FIP STATUS, @
@ User selected variables: @
@ POPULATION, PERCAP , % COL DGRE, , , , @
@ NOTE: Missing values will be displayed as . @
17085. 0. 21340. 13271. .040
17177. 0. 49013. 13550. .055
17201. 0. 239255. 14385. .060
19005. 0. 13023. 11296. .030
19011. 0. 23393. 11623. .035


Exercise 5:    For each county on the list, compute the minimum round-trip mileage for a trip on which you visit each of several cities. Do this for the following sets of cities:

            Chicago, Ely, and Fargo.
            Ely and Minneapolis.
            Chicago, Ely, and Minneapolis.
    
a.    Find FIPS codes for Fargo ND, Chicago IL, and Minneapolis, MN, using the FINDNAMES option (see Exercise 1).
b.    From ZIPFIP Main Menu, select TRIPDIST.
c.    Select MARKET; type ELYMKT.FIP <enter>.
d.    Select OUTPUT; type ELYITIN.DIS <enter>.
e.    Select DISTANCE; select ROAD <enter>, then ROUND <enter>.
f.    Select ITINERARY; enter all four itineraries, one at a time, by ZIP or FIPS codes for each destination point.

    For Ely, Fargo, and Chicago: ZIP=55731 <enter>, ZIP=58102, <enter>, ZIP=60600 <enter-esc-esc>. Note that the order of entry is not important, since the minimum distance connecting these cities will be computed.

h.    Select RUN. You can view your results by going back to ZIPFIP Main Menu and selecting LIST.

    The first few lines of ELYITIN.DIS should read:

@ Line ignored: @ Center of FIPS Market area: 47.903 91.867 27
        17085    1523    1012    1252
        17177    1518    1045    1246
        17201    1514    1078    1243

Exercise 6:    For each county on the list, compute the number of BWCA permits issued to "visiting parties", and compute the number of individuals who visited the BWCA.

a.    Select AGGREGATE from the ZIPFIP Main Menu.
b.    Select OBSERV;type BWOBS.SMP <enter>, then 2.
c.    Select MARKET; type ELYMKT.FIP <enter>, then 1.
d.    Select OUTPUT; type ELYAGG.OUT.
e.    Select AGG VARS; then select COUNT, then select SUM, then 3, then select DONE.
f.    Run AGGREGATE.

The first few lines of ELYAGG.OUT should read:
        
     @ Aggregating from ZIP to FIPS @
     @ Line 932 No FIPS found for Zip= 0 @
     @ Line 973 No FIPS found for Zip= 99999 @
     17085 .0000 .0000
     17177 .0000 .0000
     17201 2.0000 17.0000

Limitations of Zip Codes as Units of Observation


The ZIP code database, while representing an attractive level of aggregation, presents problems. These problems arise from the fact that ZIP code boundaries are not necessarily stable over time. As the population in a region grows, or shrinks, ZIP code boundaries will also grow, or shrink. Furthermore, some ZIP codes will be dropped, and others will be added.

Changes in the geographical areas served by a ZIP code are separate from actual population changes. Thus, even if a region's population remained fixed, the ZIP codes themselves could change; new ones may be added, some of the old ones may be dropped, and portions of towns may be transferred from one ZIP to another. Population changes and boundary changes will be correlated; the post office would have little reason to draw its maps differently.

The proper way to deal with this is to map census tracts (or block statistics) into ZIP codes. With counts from these tracts, and knowledge of the geographic boundaries of tracts and ZIP codes, it is conceptually easy to see how one could determine population counts for a ZIP code -- just add all tracts that fall within a ZIP code. This sort of methodology is adopted by various marketing firms.

Again, the change in ZIP code boundaries is not necessarily related to actual changes in population. Thus, the population in a given ZIP code, if an actual count were made in different years, could change even if there were no change in the on-the-ground demographics -- no births, no deaths, no moves, and so forth.

This demographic stability will never exactly hold, and in certain areas will be quite incorrect. Therefore, even if ZIP code boundaries were fixed, there will be changes in demographic statistics over time. Since complete census enumerations occur on a decennial basis, demographic measure at the ZIP code level, in mid-decade years, must be estimated with the aid of ancillary data. Examples include:

    (a)    county-level annual population estimates;
    (b)    county-level per capita income estimates; and

    (c)    yearly counts of "deliverable" addresses, available by ZIP code from the post office. These are a proxy for number of households per ZIP code.

In summary, given a ZIP code in a nondecennial year, the census estimates will be incorrect to the extent that the ZIP code's boundaries have changed, and to the extent that the actual demographics of the region have changed.

As an example of the type of problem arising when ZIP code data is used, consider the ZIPFIP80 database; where numerous ZIP codes were created after the 1980 ZIP code census database was created. For these ZIPs, there is no entry in the census database. Such new ZIPS are likely to be found in rapidly growing regions, such as Florida. Therefore, measures by ZIP code in such regions, taken after the decennial census, will face two problems: accounting for actual changes in the population, and correcting for changes in ZIP code boundaries, with the extreme case being the creation of a new ZIP code.

ZIP Data Sources and the Use of "Closest" ZIP

For the ZIPFIP-1980 database, the raw data for ZIP codes were extracted from two sources:

    (a)    A location file, circa 1986, containing all residential ZIP codes, with FIPS code, town name, and longitude and latitude. This file was used to create ZIP5LOC.UNF and ZIP5NAME.UNF.
    (b)    A set of census tapes (STF3B), circa 1980, containing numerous census variables on a ZIP code level, from which approximately 40 variables were extracted. These variables were then compressed and stored in ZIP5STAT.80.

There is not a perfect correspondence between the ZIP codes on the census tapes and in the location file. For example, there are approximately 1,000 ZIP codes read from the 1980 census tapes that were not in the location file (which is from 1986). Conversely, approximately 3,500 ZIP codes in the location files were not in the census tapes. The ZIP codes missing from the location file were probably dropped between 1980 and 1986. Conversely, those missing from the census tapes were most likely

added between 1980 and 1986.

A question that may arise is:
    What does one do when a ZIP code cannot be found in the database being searched, be it the location or the census databases?

ZIPFIP resolves this problem by finding and using the value that is numerically the closest to the missing ZIP code. For example:

    IF        7036 exists,
    and if    7037 does not exist,
    and if    7038 does not exist
    THEN    if the user requests 7037, and this "close" option is selected, 7036 will be returned;
    BUT    if the close option is not selected, then there will be NO MATCH.

The notion is that, in whichever database is being searched, the Nation is completely divided by ZIP codes. Therefore, a ZIP code that is not in the database represents the results of a redrawing of ZIP code boundary lines. Since geographically contiguous ZIP codes have values that are numerically close, it is likely that the resident of this missing ZIP code lives inside the boundaries of the ZIP code (in the target database) that is also numerically close. For example:

    IF        07891 and 07894 divide the town of FOOBAR,MI; circa 1986,
    BUT,    07891, 07892 and 07894 divide the town in 1988,
    THEN    07892 contains portions of the old 07891 and 07894.

Assuming that a residence in 07892 of the 1988 database was formerly assigned to 07891 in the 1986 database, then 07891 will frequently be correct. Note that 07891, which is closer to 07892 then 07894 (in numeric value), will be selected.

Note:    The U.S. Postal Service tries to keep a new ZIP code number near the numbers of the old ZIP codes from which it was created. This, however, is not always possible.

Three-digit ZIP codes are, in essence, well-defined geographical areas. Thus, all searches for closest ZIP codes will occur in the

same three-digit ZIP code as the requested ZIP code. For example, to find ZIP code 17891, the range from 17800 to 17899 will be searched. If no match is found in that range (if that three-digit ZIP does not exist), then there will be a message, "NO MATCH," returned.

The use of this "closest" ZIP code feature is fairly straightforward in most of the commands, except for PRINTSTATS. Since PRINTSTATS uses both the locational and census databases, several types of searches are possible. These are:

EXACT
    Both the location and census databases are checked for the requested ZIP code. Each database must contain the ZIP code; otherwise, no match is found.

APPROXIMATE (Close ZIP w/census data)
    If an exact match cannot be found, a ZIP code that has a numerically close value will be returned (provided that both census and location data are available). Note that the search is limited to the desired ZIP codes three digit ZIP code. Furthermore, only ZIP codes from the same county as the "closest" location ZIP code are allowed (note that census data must exist for the ZIP code that is used).

SEPARATE & APPROXIMATE
    Both databases are searched for closest ZIP code. The (numerically) closest ZIP in the location database is used for locational information. The (numerically) closest ZIP in the census database is used for census information. It is, therefore, possible for locational information and census information to come from different ZIP codes. The search is limited to the desired ZIP codes three-digit ZIP code.

SEPARATE & APPROXIMATE, WITH CHECK
    Similar to the example above, but this search is limited to ZIP codes for which some locational information is available. Note that information from two different ZIP codes may still be returned, but there will be location information available for both of them. If these two ZIP codes are different, it must be the case that there is no census information available for the ZIP code selected from the location database, but it is (numerically) closer to the desired ZIP than the ZIP code

selected from the census database.

    This method will limit the search of census ZIP codes to one FIPS code. Specifically, the census ZIP code must be from the same FIPS code as the location ZIP code.

Notes:

(1)    In the ZIPFIP-1980 database, the approximately 3,500 ZIP codes with no census data (but with location data) will not be available in the EXACT and Close ZIP w/census data options. The only options that access the approximately 1,000 ZIP codes with no location data is the first SEPARATE option.

(2)    The ZIPFIP-1990 database contains a more thorough (and current) list of ZIP-codes; however, the basic limitations of ZIP-code data will still be present. Also not that, at this writing, census information for 1990 ZIP-codes is limited.

Example:
    ZIP code requested: 27021
    ZIPS in locational database: 27001 27015 27018 27019 27022 27026 27030
    ZIPS in census database: 27001 27015 27017 27018 27026 27030,

    where 27001, 27015, 27018, 27019, 27022, 27026 and 27032 are from FIPS=1003, while 27018 is from FIPS=1004, and 27017's FIPS code is unknown.    

Option                Result

EXACT:            No match.

Close ZIP w/census data        27026 is returned.

SEPARATE Search     For the location database, 27022 will be returned (latitude and longitude from 27022 will be used). For the census database, 27018 will be returned (census data from 27018 will be used).


SEPARATE Search,     For the location database, 27022 will
with check     be returned (latitude and longitude from 27022 will be used). For the census database, 27026 will be returned (census data from 27026 will be used).

The STATUS Variable
A STATUS variable is always included in the output of PRINTSTATS. It reports how successful for the ZIP (or FIPS) code was. The status variable is a two-digit number (location, census). The first digit reports the results of the location search, and the second digit the results of the census search.

For example, with the number 12, the first digit, 1, refers to the location, and the second digit, 2, refers to the census result. Each digit may take one of five values:

    Value    Definition

    0    Exact match
    1    Close match
    2    No match, but no search (for closest ZIP or FIPS)
    3    No match, EVEN though a search was attempted
    4    No attempt at matching, not even looking for an exact match
    
Increasing values represent worse results, with the difference between 2 and 3 being one of effort. A value of 00 means exact match in location and census database, whereas a value of 33 means no match, in either database. The value, 4, signals to the user that the location search failed to yield a match, resulting in the census database being ignored.

Notes:

(1)    If there is an exact match in location database, the implied 0 will not be printed. For example, an exact location, with close data, will be displayed as "1", not as "01". Note that values of less than 10 mean that location match was exact.

(2)    Possible values of the STATUS variable will vary according to the type of search requested.

A Word to the Wise

Each entry in the "input file" may be associated with as many as three ZIP codes. These include:

(1)    The Location ZIP code. In the PRINTSTATS command, this is always the first variable in the output file.

(2)    The Census ZIP code. In PRINTSTATS, this will only be output if the user requests it as one of her/his census variables.

(3)    The requested ZIP code from the input list. It is not written to the output file. However, a one-to-one correspondence is maintained between lines in the input and output files.

A census ZIP code not matching a location ZIP code will occur only when one of the two SEPARATE options is requested. If the returned STATUS equals 0, these three ZIPS will have the same value.

Adjusting for Missing Observations through Scaling

One technique to account for the entire population is to scale or weight ZIP codes by a correction factor. ZIPFIP uses this technique to provide corrected measures of ZIP code population.

Information at grosser levels of aggregation (counties) is used to adjust finer aggregates (ZIP codes) so that the finer aggregates "add up" to the quantity known to exist at the grosser level. Specifically, given accurate measures of FIPS populations, a scale may be applied to all ZIPS in a FIPS, so that the sum of scale populations from all the ZIPS in a FIPS will equal the FIPS' population.

Applying this technique to "missing observations" implies a division of the nation into "quasi-ZIP" codes; where each quasi-ZIP code represents a ZIP code for which both Census and Location information is available (about 35,000 units in the 1980

database). The user then maps each and every individual into these (35,000) quasi-ZIP codes.

By scaling the population in the appropriate quasi-ZIP code (as extracted from the census tapes), we may get a close estimate of the population that would result had we performed this "quasi-ZIP code mapping" to the entire American population. At the very least, some consistency is maintained between quasi-ZIP code and FIPS aggregation.

The Close ZIP w/Census Data option was constructed with this idea in mind. Specifically, for population counts from ZIP codes, this option should be used in conjunction with the POPFIX scale.

It should be noted that, in the ZIPFIP package, scaling has a variety of uses other than this missing observation correction. For example, mid-decade county estimates of population are used to scale mid-decade ZIP code populations. The same holds for other variables, such as unemployment, where one only needs some fraction to be applied to all ZIPs in a FIPS, with each FIPS having its own value. See Appendix F on "Scaling" in the first chapter for further discussion of scaling.

In summary, there are three options for dealing with missing observations, none of which provide a perfect solution:

    Ignore the problem: use EXACT ZIP;
    Scale: use Close ZIP w/Census Data with the POPFIX scale; or
    Search separately: search the Location and Census databases separately.

A Final Note On Searching For FIPS Codes

Since county boundaries rarely change, the problem of matching FIPS from observations to FIPS in the database should not frequently arise. For those special cases, however, exact analogies of the EXACT and CLOSE ZIP options are available to match FIPS codes.

Note: In the ZIPFIP-1980 database, counties with possible problems are located in Virgina, Alaska, Arizona, and New Mexico. These problems are partially corrected in the ZIPFIP-1990 database.


This document last modified at 0:17a, on 21 Jun 2003.