Introductory Users Manual |
6 June 2003. This manual is somewhat dated. In particular, a number of new databases are now packaged with ZIPFIP! |
ZIPFIP is both a set of databases containing census and locational
information at the ZIP (ZIP code) and FIPS (county) level, and a
program by which these data sets may be accessed and manipulated.
It has several useful features, including: editing and displaying data;
defining spatial boundaries known as market areas; determining
distances between any two sites in the lower 48 States; and
aggregating observations. ZIPFIP can extract commonly used census
information by county or ZIP code, and correct for missing values.
ZIPFIP is the result of a long process of acquiring data and massaging
the user interface. We thank the Resources and Technology Division
of the U.S. Department of Agriculture's Economic Research Service
(ERS), and the Rocky Mountain Forest and Range Experiment Station
of the U.S. Forest Service, for their support.
We also thank a variety of individuals who have provided valuable
data and insights, including: Karen Mizer, Noel Gollehon and Ralph
Heimlich of ERS, Glenn Brink and George Peterson of the U.S. Forest
Service, Robert Mendelsohn of Yale University, and George
Muehlbach of the National Center for Resource Innovations.
The ZIPFIP package was created by Daniel Hellerstein of the
Economic Research Service (ERS), Danette Woo (ERS), and Dennis
Donnelly and Daniel McCollum of the U.S. Forest Service.
Although we have attempted to find and correct all problems, we
assume no responsibility or liability for errors in the program, or
misunderstood features. Nevertheless, if problems arise, please let
us know -- whether they be bugs, operational difficulties, or
incomprehensible documentation. We also stress that the data in
this package is not guaranteed to be accurate for every ZIP or FIPS
code; ZIPFIP is designed to facilitate bulk processing of data and not
as a precise reference tool. If more accurate information on
particular counties (or ZIP codes) is required, we recommend
obtaining material directly from the Census bureau or from
commercial sources.
To install and operate the ZIPFIP package, the user will need a machine that can support a version of DOS. ZIPFIP has been tested under DOS< OS/2, WIN 98, WIN NT, and WIN XP.
If questions arise, either of a technical or practical nature, contact
Daniel Hellerstein at:
Economic Research Service
Resources and Environment Division
1800 M. ST. NW, Room s4006
Washington, DC 20034
(202) 694-5613
ZIPFIP: A ZIP and FIPS database
ZIPFIP is a set of databases containing census and locational
information at the ZIP (ZIP code) and FIPS (county) level. ZIPFIP is
also a program to access and manipulate these data sets.
Features supported by ZIPFIP include: data editing/display, market
area creation, determining distances between any two spots in the
lower 48 States, and aggregation of observations.
While it does not match the user friendliness of the most
sophisticated commercial products, nor the breadth of CD-ROM-based databases, this menu-driven program is easy to follow, even
by the casual user, and comes with plenty of on-line help for
guidance and reminders.
This manual describes the hows and whys of ZIPFIP. The first
chapter, the Command Documentation, is all that the casual user
needs to examine. The second chapter contains a tutorial to help
users understand the capabilities of ZIPFIP, and the third chapter
discusses problems to which a user of ZIP code census data
should be alerted.
Data.
A variety of databases are available for ZIPFIP. These include:
ZIPFIP-1980 Contains county (FIPS) and ZIP-code information
from 1980. Variables include latitude, longitude,
town and county names, and several census
measures; including population and per capita income
from 1969 to 1988.
ZIPFIP-1990 An update to ZIPFIP-1980, incorporating several new
counties and containing several new measures from
the 1990 census. ZIPFIP-1990 contains limited ZIP-code information, including latitude, longitude, place
name, and number of deliverable addresses.
AGSTATS A collection of ZIP-code and county-level agricultural
statistics from the 1978, 1982, and 1987 Census of
Agriculture; including measures of the value of land
and capital, and productivity of several major crops.
NRI A set of variables from the 1982 National Resources
Inventory, including soil type and land cover
(aggregated at the county level).
FIPSREGN A set of "identifiers," by county. FIPSREGN can be
used to cross-reference counties to other
geographical units, such as SMSA and Land Resource
Areas.
CLIMSOIL A set of measures of climate (temperature and
precipitation) and soil characteristics (aggregated at
the county level).
Notes: The databases containing information at the ZIP-code
level are ZIPFIP-1990, ZIPFIP-1980, and AGSTATS.
The ZIPFIP package is currently shipped with the
ZIPFIP-1990 database. To obtain the other databases,
see ordering instructions at the back of this manual.
For a description of the variables, and sources of
information, see the ZIPDATAS.DOC and SOURCES.DOC
files (in the \ZIPFIP directory), and ZIPFIP90.DOC (in the
\ZIPFIP\DATA directory).
Commands.
ZIPFIP provides a number of commands, divisible into several
categories:
i. Database editing: EDITSTAT and EDTZFIP.
ii. Data extraction: FINDZFIP, FINDNAME, and PRINTSTATS.
iii. Market area creation: MARKET and ZIP2FIP.
iv. Distance computation: COMPDIST and TRIPDIST.
v. Aggregation and assignation: AGGREGATE.
Market areas are defined as a set of zones. Typically, these zones
are ZIP codes or FIPS codes. A typical market area consists of
one or a number of ZIP (or FIPS) codes within a user-defined
distance of a central location.
Aggregation is the act of combining observations. An aggregation
may combine observations on individuals into ZIP aggregates, or
combine ZIP aggregates into a FIPS aggregate. For example, given
a list of visitors to a park and the ZIP-code of each visitor,
aggregation can be used to derive the total number of visitors from
each ZIP-code in the "market area" surrounding the park.
Assignation is the act of assigning an observation to a zone,
where a zone may be a ZIP-code, a county, or a geographically
defined location.
Data extraction refers to finding and displaying data. ZIPFIP can
also apply a number of corrections to the data. For example,
missing values occur frequently in ZIP data, due to small samples
and due to "nonexistent" ZIP codes. Corrective actions for ZIP
data include:
(a) the use of FIPS measures for missing values in the ZIP
data; and
(b) the scaling of the ZIP population variable to account for
changes in ZIP code boundaries since the 1980 census.
Database editing refers to changing values in the ZIPFIP database. Distance computation refers to computing great circle or road distances between a set of user-selected points.
Helpful Hints.
. Case sensitivity: ZIPFIP is case-insensitive.
. On-line help: To access on-line help, enter ALT-H (hit the ALT
key and the H key simultaneously), or the F1 function key.
. Temporary exit to DOS: Whenever input is requested, the user
may request a temporary exit to DOS by typing Ctrl-E (hit the
Ctrl and E keys simultaneously). ZIPFIP can be re-invoked by
typing EXIT (after which we recommend hitting ALT-R to
refresh the ZIPFIP display).
. ENTER and ESC keys: The word ENTER refers to the ENTER
key (the carriage return key). When equated to a definition,
such as ENTER=Default, a strike of the ENTER key selects that
command (in this case, the default). The ESC key refers to the
Esc (Escape) key. ESC will usually cancel the current
command, menu item, or input request.
. Input files: ZIPFIP often asks the user to select an input file.
The input file (which must already exist) contains information
needed by ZIPFIP; such as a list of ZIP codes for which ZIPFIP
will produce further information (please see Appendix D for
further discussion of input files).
. Market area files: ZIPFIP is both a creator and user of market
area files. Market area files contain a list of zones, where a
zone can be a ZIP code, a FIPS code, or a location by latitude
and longitude.
. Observation files: Observation files are files that contain "raw"
data, and are typically not produced by ZIPFIP. ZIPFIP often
requests observation files, which it then processes; for
example, ZIPFIP can aggregate entries in an observation file into
larger zones.
. Output files: ZIPFIP often creates output files, such as market
area files. If one does not otherwise specify a path and file
name, the output file will be written to the /OUTPUT
subdirectory (typically, /ZIPFIP/OUTPUT).
. Comment lines: Comment lines and error messages can be
written to the output file, to a "log" file, or to both files. When
written to the output file, comments/error messages are always
preceded and trailed "comment" characters (the @ character is
used by default).
. Header: When an output file is created, the user may write a
multi-line descriptive "header" to it.
. Power-user tips: ZIPFIP has a number of "power-user" features that facilitate the speedy and efficient user of ZIPFIP. Please see Appendix G, "Power Users Tips," for a description of these features.
Installation of ZIPFIP package is automatically done by the
INSTALL batch file, which is located on the INSTALLATION
diskette. The READ.ME file on that diskette contains detailed
instructions. It takes about 5 minutes to complete the installation.
To run the ZIPFIP programs, you need at least an IBM-Compatible
computer with an 80286 or better CPU, running DOS version 3.3
(or above). You will also need at least 475K of available memory
(memory available for programs after DOS, drivers, and TSR
programs have been accounted for). Ideally, 530K of available
memory should be free, since this permits one to temporarily exit
to DOS. The package takes up approximately 6 Megabytes on a
hard disk. If this is excessive, a number of database files and less
important programs need not be retained.
To run ZIPFIP (after installation), you should set your default
directory to be \ZIPFIP (or whatever directory you selected), and
then enter ZIPFIP from the DOS prompt. A menu will then appear,
from which you select the desired ZIPFIP command (or invoke help
using the F1 function key). For example, if you have installed
ZIPFIP on the \ZIPFIP directory on the D: drive, after booting up ...
C:\>D:
D:\>CD ZIPFIP
D:\ZIPFIP>ZIPFIP
will get you going!
Note: Unless you specify otherwise, all output files are written to
the \OUTPUT subdirectory (for example, \ZIPFIP\OUTPUT if
you installed to the \ZIPFIP directory).
Notes on CONFIG.SYS
For ZIPFIP to operate correctly, your CONFIG.SYS file should
contain the following lines:
DEVICE = ANSI.SYS or (DEVICE = C:\DOS\ANSI.SYS)
FILES = 25
BREAK = ON
If CONFIG.SYS does not contain the DEVICE = ANSI.SYS line,
... a number of extraneous (and very distracting)
characters will appear on your screen,
... boldface, color text, and other highlights will
probably not work.
Alternative: If you can not include DEVICE=ANSI.SYS
line in your CONFIG.SYS file, you can use
the ANSI program (ANSI.COM, obtained
from COMPUSERVE) included on the
INSTALLATION diskette before running
ZIPFIP (it will perform the same function as
including ANSI.SYS in your CONFIG.SYS
file).
Note: ANSI.SYS is a file that is supplied with DOS, and is
often located in the \DOS directory.
If CONFIG.SYS does not contain the FILES=25 line,
... DOS limits the number of open files to three, not
nearly enough.
If CONFIG.SYS does not contain the BREAK = ON line,
... the control-C interrupt may not work correctly.
What to do if 6M is not Available:
As currently structured, you will need over 6M to install the full
ZIPFIP package. Once installed, nevertheless, several "less
important" files can be deleted without substantially affecting the
capabilities of ZIPFIP. More or less in order of increasing
importance, the following files can be deleted (depending on what
database you installed, some of these files may not exist):
1) Sample output files, in the \OUTPUT directory, can be deleted.
2) ADDSCALE.EXE and ZFCREATE.EXE (.600K) -- used for
modifying ZIPFIP databases. Located in the \DATA directory.
3) ZIP5NAME.xxx (.1M). A list of ZIP-codes and the name of
the post office. FINDNAME will not work with ZIP codes.
Note: The .xxx refers to a database specific extension, such
as ".90."
4) FIPSNAME.xxx (.120K). FIPS.NAM is a list of FIPS codes
and associated names. FINDZFIP and FINDNAME will not
work properly with FIPS codes.
5) ZIP5INDX.xxx (.200K). Will slow down execution, but
should have no other effects.
6) ZIP5STAT.xxx (.500k). PRNTSTAT and EDTSTAT will not
be able to extract ZIP census data, but FIPS census data (and
ZIP geographic information) will still be available.
7) USFIPS.PLG (.600k). AGGREGATE will not be able to assign
observations to FIPS codes based on county boundaries (the
location to polygon option).
8) VUPOLYS.EXE (.400k). VUPOLYS displays/creates ZIPFIP
polygon (.PLG) files.
Running ZIPFIP under OS/2 2.0 and WINDOWS 3.1
ZIPFIP will run under OS/2 2.0 with no modifications, although you
may find that the mouse does not work quite right; in which case
we can only suggest using cursor keys instead of the mouse. An
OS/2 icon for ZIPFIP (ZIPFIP.ICO) is included on the installation
disk, should you desire to install ZIPFIP directly onto your desktop.
Although not formally supported, ZIPFIP can be run as a DOS
application under WINDOWS 3.1 (you can use the FILE-NEW
option to set up a generic icon on the WINDOWS desktop).
However, you may have problems with file access, especially
when using ALT-F to view directories.
If major problems arise, contact Daniel Hellerstein or Daniel
McCollum at the addresses listed above.
At the beginning of ZIPFIP the user will be presented with a Main
Menu of commands. The menu will look like:
Using 1990 FIPS and ZIP codes and
Base Year = 1990,
Select a ZIPFIP Option (F1 for help):
EXIT Return to DOS
AGGREGATE* Aggregate observations by FIPS or ZIP
MARKET* Create market area of FIPS or ZIP codes
COMPDIST* Compute distances between points
EDITSTAT Edit ZIP & FIPS census data
EDTZFIP Edit ZIP & FIPS location data
FINDNAME Search ZIP & FIPS name-databases for a match
FINDZFIP Display ZIP & FIPS location & name information
PRINTSTATS* Display ZIP & FIPS census information
TRIPDIST Compute distances for trip-itineraries
ZIP2FIP Find ZIPs inside of a FIPS, FIPS inside of State
INITIALIZE Change default database(s), display options, misc.
LIST Display (output) file on screen
DOS Temporary exit to DOS
Select option:?
_____________________________________________________________________
*These are the most important commands for the casual user.
After the main menu appears, the user may select a command by either typing in the command name, or using the cursor keys (or mouse) to highlight a command.
This program aggregates all "observations" into "zones." Several
types of "observations" and "zones" are recognized, including ZIP
codes, FIPS codes, and location (a latitude and longitude pair).
AGGREGATE supports 10 different types of aggregation:
Type of Observations Type of Zones
1) FIPS codes FIPS codes
2) FIPS codes Locations
3) ZIP codes FIPS codes
4) 5-digit ZIP codes 3-digit ZIP codes
5) 5-digit ZIP codes 5-digit ZIP codes
6) 5-digit ZIP codes Locations
7) Locations FIPS codes
8) Locations 5-digit ZIP codes
9) Locations Locations
10)Locations Polygons
The "type of observation" refers to the manner in which an
observation's location is identified. For example, each observation
in the "observation file" may include a ZIP-code variable. The
"type of zone" refers to the level of aggregation (or assignation)
desired.
Aggregation Types 7 and 8 are approximate aggregations based on
proximity. Similarly, Types 2 and 6 are also approximate, being
based on distance from the center of the FIPS (or ZIP) to the
latitude/longitude. For exact assignments of "locations", you
should use type 10 (location to polygon), provided you have the
appropriate polygon file (see notes below).
AGGREGATE provides three methods of aggregation:
1) ASSIGN Assign each observation to a zone.
2) COUNT Tally observations by zone assignation (the number
of observations occurring in each zone is counted
up).
3) SUM Sum the values of a variable (extracted from each
observation) by zone assignation.
AGGREGATE requires two input files:
(a) An observations file, in which each entry is an
observation containing an identifier. The identifier
may be a ZIP code, a FIPS code, or a
latitude/longitude pair.
(b) A market area file is a list of "zones," where each
zone is either a ZIP code, a three-digit ZIP code (the
zone consisting of all ZIP codes sharing the first
three digits, such as ZIP codes 20900 to 20999), a
FIPS code, or a latitude/longitude pair.
Alternatively, the market area file can consist of a
special "polygon file" (for example, a file of state
boundaries).
Note: For ASSIGN, a market area file is not required.
For COUNT and SUM, observations that do not fall
within the market area are discarded.
An output file, containing the observation-to-zone assignations,
and the requested counts and sums for each zone in the market
area file, will be produced. For example, a data file of a sample
taken from randomly drawn households contains observations of
ZIP codes and purchased quantities of a selected consumer
product:
(1) If you select SUM, ZIPFIP can tell you how much of the
selected product was purchased in each ZIP code area.
(2) If you select COUNT, ZIPFIP can tell you how many
households in each ZIP code area purchased the selected
product.
(3) Finally, if you select ASSIGN, ZIPFIP can assign each
observation to a FIPS.
Example: input file (FIPS is included for reference):
ZIP Groups Days Visitors (FIPS)
52544 1 15 10 19007
52549 1 3 2 19007
52556 2 14 5 19101
52558 1 4 6 19101
52591 1 6 6 19107
Assuming that the user chooses to aggregate from ZIP code to
FIPS code (Type 3), selecting the fourth variable to SUM will yield
the sum of number of visitors, for each FIPS (in the market area).
Selecting COUNT, and SUM with variable 2, will yield the total
number of observations, and the sum of number of groups, for
each FIPS (in the market area), respectively. Note that within a
FIPS, the summation occurs across all observations that fall inside
of that FIPS. Thus, if variable 4 (Visitors) is chosen, then FIPS
19007 will have a value of 12, FIPS 19101 will have a value of
11, and FIPS 19107 will have a value of 6.
NOTES:
(1) "Polygons" market areas consist of a set of non-overlapping
polygons, such as State and county boundaries. Polygon
market area files must be specially created (you cannot use
text files). ZIPFIP comes with two polygon files: one for the
State boundaries of the lower 48 States (plus the District of
Columbia), and one for all U.S. counties. These two polygon
files are located in the \DATA subdirectory of ZIPFIP, and are
named US48STAT.PLG and USFIPS.PLG respectively.
(2) Users interested in creating their own "polygon" files should
see the VUPOLYS.EXE program (in your \ZIPFIP\DATA
directory).
(3) Use the closest ZIP option to match observations whose ZIP
code has no exact match in the ZIP location database. Use
the maximum distance option to limit the range within which
latitude/longitude matches occur (matches further then this
minimum are considered to be out of the marker area).
(4) Comments about errors and other difficulties encountered
are written to the output file, since it is expected that the
user will edit the output file. Note that, in most cases, the
output file may be readily matched to the market area file,
since all zones read from the market area file will have a line
in the output file, even if there were no matches to it (where
the line will contain zeros).
(5) In each line of the output file, the zone identifier (for
example, a FIPS code) is always written first.
(6) The output from MARKET and ZIP2FIP can be used as a
market area file.
(7) AGGREGATE may read up to 40 variables, per observation,
from the observations and market area files.
(8) Each observation (identifier, plus latitude and longitude) in
the market area file must be on a single line. When
latitutde/longitude locations are read from market area files,
you can provide the "variable number" containing a numeric
identifier of this location. For a discussion of input files, see
the appendix, "Input and Output Files."
(9) Each observation (identifier plus other variables) in the
observations file must be on a single line. When
latitutde/longitude locations are read from an observation
file, you can provide the "variable number" containing a
numeric identifier of this location.
(10) When selecting a market area file, ALT-F will display the directory of the current "input file directory".
MARKET will create a market area data file. The idea is to select
all zones within a distance (road or great circle) of some user-supplied center -- that is, all zones that pass a user-specified
"proximity test." These zones may be either ZIP codes or FIPS
codes. Hence, the output of MARKET is a file containing the ZIP
or FIPS codes that pass the test of proximity to the center.
Optionally, one may direct MARKET to also produce, for each zone
selected, a list of distances to a set of user-selected sites.
The user may enter up to 10 sites, for which distances (either road
or great circle) to each zone code that passes the proximity test
will be computed. Of course, the user-supplied center may be one
of these sites: easily specified by selecting the default (hit the
ENTER key) when sites are asked for.
For a zone to be selected, two tests must be passed:
1) The center of the zone (of the ZIP code or the FIPS code)
must lie within a band, where the band is specified using a
minimum and a maximum distance (setting a minimum
distance of zero converts the band into a circle).
2) The center of the zone must lie within a "quadrant" (an arc)
of the band, where the quadrant is specified using two
angles, only zones within the arc bounded by these two
angles are accepted (note that the arc can be larger then
180 degrees). Selecting angles of 0 and 360 degrees
converts the quadrant into the entire band.
The user may ask for road or great circle distance, either in
computation of distance to alternate sites or in the proximity
measure. Great circle distance is specified by entering only the
latitude and longitude when a site is requested -- in other words,
do not provide a State identifier.
ZIP2FIP will find (and write to an output file) either a list of all FIPS
inside of a State (or a list of States), or a list of all ZIP codes inside
of a FIPS (or a list of FIPS codes). An input file containing a list of
FIPS codes, or two-letter State abbreviations, is expected. Output
consists of either a list of FIPS code, or a list of ZIP codes (and
FIPS codes). Note that input MUST be from a file.
NOTES:
(1) For instructions on entering sites, see the appendix, Entering
Sites. For a discussion of distance computation, see
Appendix B, "Computing Distance."
(2) The user may have header lines included at the top of the
output file. Since the user is expected to edit the output file,
comments about any errors encountered are written to the
output file.
COMPDIST computes distances through two basic functions:
computing a multi-stop distance given a list of locations entered by
the user; and producing an N x K set of distances, given N starting
locations and K end locations.
For the first function, the user enters a list of locations from the
keyboard. COMPDIST merely computes distances (road or great
circle) between consecutive locations, and adds them up.
For the second function, the user specifies two lists of locations.
The first list contains a list of start-locations, the second a list of
end-locations. Distances are computed for each start location/end-
location pair. If there are less than 10 end-locations, then the
distances for each "start-location" will be written on one line in
order of entry to the end-location file. Otherwise, each line will
contain one distance, with both start and end-location identified.
Locations may take one of three forms:
(i) A FIPS code.
(ii) A ZIP code.
(iii) A latitude, longitude, and State.
For (i) and (ii), the closest match may be requested.
Output is written to the monitor, or to a user-specified file. Each
line of the output file will contain an identifier, such as the start-location FIPS or ZIP. For iii, the "line number" in the file is used as
an identifier. Alternatively, the user may instruct COMPDIST to
use an identifier pulled from the input file.
The input files should have the following format:
(i) and (ii) FIPS/ZIP optional_id:
95616 DAVIS
(iii) Latitude Longitude State optional_id:
38.5 121.7 CA DAVIS
The optional_id may be up to 10 characters long. In the above examples, the optional_id is DAVIS. Note that the State name (CA in the above example) typically follows the longitude. However,
the user can specify the "variable number" of the State name.
Either Road Distance or Great Circle Distance may be requested. However, if the State name is not available the road distance will not be computed (instead, the Great Circle Distance will be computed). See Appendix A, "Entering Sites," for further discussion on entering geographical locations. See Appendix B, "Computing Distances," for a discussion of the methodology used to compute great circle and road distance.
EDTZFIP is used to correct errors, or to update information, in
either the ZIP FIPS location database, or in the ZIP town name
database. See EDITSTAT for editing census databases.
The ambitious user may find ample opportunities to use this
command. For example, in the ZIPFIP-1980 database, the ZIP
location raw data have a limitation in that all ZIP codes associated
with a central post office are given the same town name and the
same location (same latitude and longitude). Thus, ZIP code,
02164, in Newton, MA. is given the name and location of
BOSTON, since 02164 is a substation of the central Boston, MA
post office.
EDTZFIP allows one to edit either the location or (for ZIP) the
name database. It expects a record number, it does not expect a
ZIP (or FIPS) code (FINDZFIP can be used to find these record
numbers). The user supplies this information either from the
keyboard, or from an input file. For instructions on the use of
input files in EDTZFIP, see the on-line help.
For further details on the use of EDTZFIP, including instructions on the use of input files, see the on-line documentation.
EDITSTAT is used to display and change fields in the three census
databases:
(a) The ZIP code census database.
(b) The FIPS code census database.
(c) The FIPS code scale (timeseries) database.
(To change ZIP code or FIPS code latitude or longitude, or ZIP
town name, see EDTZFIP.)
The user first enters a ZIP or FIPS code (or a record number, like in
FINDZFIP). EDITSTAT then displays a table of all the variables
selected and their values, from which the user selects a variable to
change. This selection is easy, just enter any unique portion of
the variable name, or use the cursor keys (see Appendix C,
"Variable Names").
Example: If the user has selected FIPS 25017, s/he may change
%POVERTY by entering POV at the "variable to
change" question. Similarly, s/he may enter
UNEMPLOY to select the % UNEMPLOY variable.
NOTES:
(1) All changes are permanent. Therefore, striking Ctrl-C will not
bring back inadvertently changed values.
(2) When missing values are encountered, an M will be written.
Overflow is treated as a missing value.
(3) For most percent variables, only values between 0 and 100%
may be specified. The actual bounds for each variable
depends on how the databases was created.
Use FINDZFIP to search the ZIP (or FIPS) location and name
databases, and report basic information:
(a) For FIPS, report the longitude and latitude of the requested
FIPS, or the name of the county.
(b) For ZIP, report the longitude and latitude, FIPS, or town name
of a requested ZIP.
With either choice, the record number of the ZIP (or FIPS) is also
displayed.
After selecting the database, one then provides a ZIP (or FIPS) for
which to search. If found, the information in the database is
returned. If there is no such ZIP (or FIPS), then FINDZFIP will
search for a ZIP (or FIPS) with a reasonably close number, and
return the information associated with this "nearby" ZIP (or FIPS).
If necessary, one may select an absolute record number (in the
selected database) to display. Do this by entering the negative of
the record number the user wants.
FINDNAME will search either the ZIP or FIPS "name" databases.
The ZIP name database is used when looking for a particular town,
while the FIPS name database is used when looking for a particular
county.
After selecting ZIP or FIPS, the user provides a name. FINDNAME
will then search the appropriate database for all ZIP (or FIPS)
codes that match this name. This search may be over the entire
United States, or may be limited to search a single State. For ZIP
codes, the name, ZIP code, FIPS code, and State of every match
are displayed. For FIPS codes, the name, FIPS code and State are
displayed.
There are two display options:
(a) If any matches are found, the number of matches and the
location of the first match are displayed; or
(b) All ZIP codes (or FIPS codes) that match the name provided
by the user are displayed.
There are also two search options:
(a) Exact matches only; or
(b) Substring match. In this case, the requested name matches
any substring of the ZIP code (or FIPS) name; it need not be
an exact match. For example, ORI would match PEORIA,
ORINVILLE, and so forth.
Input may be from either the keyboard or an input file. If you use
an input file, each line should contain both the ZIP (or FIPS) name
and the State name, separated by a comma. For example:
FRONTIER ,NE
SHERIDON ,NE
HALL ,NE
BEDFORD ,PA
YORK ,PA
LANCASTER ,PA
This is probably the most useful command for the general user. It
will produce an output file containing census information (such as
per-capita income, average temperature, or bushels of corn
produced) for each FIPS (or ZIP) code in a user-supplied list of FIPS
(or ZIP) codes.
A variety of options are available in PRINSTATS, including:
GO Generate output (display data)
DATABASE Examine ZIP or FIPS, or FIPS using ZIP
VARIABLES Select variables to extract
YEAR Select current year
SCALE Select scale factors
MISS OBS Select method of dealing with missing
observations
MISS VALS Use FIPS value if ZIP value is "missing"
DISTANCE Compute and display a distance variable
OUTPUT Select output file
INPUT Select input file
HEADER Add header to output file.
EXIT Exit PRINTSTATS
Each of these commands is explained in greater detail below.
DATABASE: To Select ZIP or FIPS Data
There are three options:
(a) Produce a list of census variables for selected FIPS codes.
(b) Produce a list of census variables for selected ZIP codes.
(c) Converts ZIP codes to FIPS variables, This option will use ZIP
codes as input. In other words, display data for the FIPS
code that the ZIP code is located in.
Whenever DATABASE is selected, any previously selected
variables or scales are dropped. The user must reenter the desired
variables and scales. The default DATABASE is FIPS codes.
VARIABLES: To Select Variables to Extract
The user may select any subset of the variables in the database of
interest. A menu-like mechanism is used to select variables, with
variables selected either by entering their names (or a unique
substring), or by using the cursor keys (or the mouse).
SCALE and YEAR: To access time-series information
The YEAR and SCALE option are used to access time-series
information. Furthermore, SCALE may be used to modify ZIP code
or FIPS data; either to account for missing ZIP codes or to
generate census values for years other then the base year (e.g.;
1990) for variables lacking non-base year information.
SCALE
The user may selectively modify variables given using
"scales". Two kinds of scales exist:
(i) USER -- County invariant, with values entered at run
time. As a convenience, the Consumer Price Index
(CPI) is hardwired into ZIPFIP (with 1980 as a base
year).
(ii) COUNTY SPECIFIC -- These scales have a separate
value for each county, with values stored in the
"SCALE" database. Each of these scales consists of
two identifiers: A NAME and a YEAR. The NAME
identifies the target variables, and the year represents
the year. For example, the 82, PERCAP scale is
designed to adjust the PERCAP variable, causing 1982
values to be reported. Alternatively, the 82, PERCAP
scale can be used as an approximate measure of related
variables (say, HOUSEHOLD INCOME) for which explicit
time-series data is not available.
Each variable may have up to four scales applied to it,
consisting of any mix of USER or COUNTY SPECIFIC
scales; where the COUNTY SPECIFIC variables are
selected both by year and by variable.
YEAR
The user may instruct PRINTSTATS to display values, if
possible, from a user-selected "current year". For
example, if the user wants her/his variables to reflect
1982 information (using whatever 1982 information
may be available), s/he may select 82 as the current
year. Most variables have limited time-series information
available, while a few (such as population, per capita
income) have many years of time-series information.
Implementation note: the YEAR option is actually an
automated subset of the SCALE option.
Example 1: (Using the ZIPFIP80 database). To generate a
1984 ZIP code populations that are corrected for
missing ZIP codes, select 1984 as the "current
YEAR", and assign the scale 80,POPFIX to the
ZIP code population variable.
Example 2: To generate a value for 1984 per capita income
expressed in 1980 dollars, select 1984 as the
current year, and apply to the per capita income
variable the USER scale whose value is the 1984
CPI deflator (such as the CPI84 "hard-wired"
USER scale).
Notes:
(1) SCALE and YEAR must be selected after variables have been
chosen (the current YEAR, and all scales, are cancelled when
variables are selected).
(2) When YEAR is selected, all previously selected scales (using
SCALE) are cancelled.
(3) As with variable names, the user may use substrings when
selecting a scale name.
(4) When scaling ZIP code data, or when YEAR is selected with
ZIP-code data, the scale values used are drawn from the FIPS
that contains the ZIP. In addition, when using YEAR with ZIP
code data, the reported values will be based on FIPS level
information.
Example:
Suppose scale 83,POPULATION for FIPS 25017 is 1.05, and
ZIP 02165 is part of 25017; and that the POPULATION of ZIP
code 02165 is desired. If 83,POPULATION is a requested
scale (or if 1983 is the requested "current year" ) then the
scale used for 02165 will be 1.05.
(5) For a further description of how scales are used, see the
appendix on scaling.
MISS OBS: Options for Missing Observations
ZIP codes are not permanent entities. They are created, removed,
and changed frequently (note that FIPS codes are rarely changed).
Given limited data resources, this may complicate matters. The
MISS OBS option allows the user to choose between several
techniques for dealing with the fact that a desired ZIP code may
not exist, or may be missing either "location" or "census"
information. Four options are provided:
X Finds exact match. The ZIP must have entries in both the
location and census database. Otherwise, skip this ZIP (or
FIPS), and write a "No Match" line to the output file.
D If there is no exact match, looks for a ZIP (or FIPS) code
with a reasonably close numeric value (that has entries in
both census and location databases).
S Applies to ZIP data only. Separate matches of census and
location databases are performed.
(i) A first ZIP code is found in the location database that
exactly matches or is numerically close to the desired
ZIP code.
(ii) A second ZIP code is found in the census database
that exactly matches or is numerically close to the
desired ZIP code. Note that the resulting two ZIP
codes need not be the same.
B Applies to ZIP data only. Same as S, but the zip code
found in step 2 MUST have a location. In other words, the
location and census zip codes can still be different, but for
the ZIP code found in step 2, there must be location data
available.
For example, in the ZIPFIP-1980 database, there are about 1,000
ZIP codes that have census data, but no location data. These can
be accessed ONLY when S is selected.
Notes:
(1) Option D was designed to be used with the "ZIP code
population correction" scale (the scale POPFIX ).
(2) The default is X.
(3) For a complete discussion of the problem of missing ZIP
codes, see the third chapter of this documentation, "Limits of
ZIP codes as Units of Observation."
MISS VAL: Options for Missing Values
For a variety of reasons, such as confidentiality, the value of some
variables will be missing. The MISS VAL option offers a partial
solution to this problem when ZIP data is desired. (Note: ZIP data
is much more likely to contain missing values than FIPS data).
Since ZIP codes are disaggregated FIPS codes, it is often logical to
use an appropriate value from the FIPS when a ZIP value is
missing. For a large class of variables (for example, percents and
per capita averages), this substitution of FIPS values for missing
ZIP values is defensible as a second best solution. This, however,
is not always the case; one would not want to use raw counts
(such as population).
PRINTSTATS supports this feature through the use of variable
names. Specifically, if a ZIP code has a missing value, and one
has requested that a FIPS value be used in its place, then the
variable from the FIPS database with EXACTLY the same name
will be used (obviously, from the FIPS that the ZIP is inside of). If
a ZIP variable does not have an exactly similar counterpart in the
FIPS database, then this missing value replacement will not be
available. For a further discussion of variable names, see
Appendix C, "Variable Names."
The default is NO missing value replacement. When a value is
missing, the PRINTSTATS will display a period or an asterisk (an
"." or "*"). See the third chapter of this documentation: "The
Limitations of ZIP Code Data as Units of Observation," for further
discussion, with special attention to the problem of "missing
observations."
INPUT
The list of ZIP (or FIPS) codes to be processed may come
either from the keyboard or from an input file (the default input is
the keyboard). For example, you can use the output of MARKET,
or ZIP2FIP, as input files. For a discussion of the use of input files,
see Appendix D, "Input and Output Files."
In addition to containing a list of observations (for example, a
list of FIPS codes that comprise a market area), the input file can
also contain special "statistical" commands. These commands tell
ZIPFIP to generate and output some simple statistics on the
variables you selected. The commands can be placed anywhere in
your input file, the statistics will be computed for all observations
read up to the location of the command.
PrintStats statistical commands are:
$SUM Compute sum of each variable.
$VAR Compute variance of each variable.
$MEAN Compute mean of each variable.
$MAX Compute maximum of each variable.
$MIN Compute minimum of each variable.
$RESET Reset all statistics. This is useful if you want to
generate separate statistics for several subsets of
observations (say, for each of several market areas
listed in one input file).
Note that these commands should appear on separate lines.
Two other options are relevant for PrintStat input files.
i) Output comments. If you desire, all comment lines
encountered in the input file can be written to the output
file.
ii) Suppress output of individual observations. If you desire,
data on individual observations can be suppressed. This is
useful if you only want to see the statistics, and do not
want to clutter up your output file with extra data.
OUTPUT
Output may be directed either to the user's display screen or to an
output file. Selecting O allows the user to name an output file to
which to write results. Depending on the value of option Z, the
output file will be structured as:
(a) FIPS FIP_STATUS Census_Variables.
(b) ZIP ZIP_STATUS Census_Variables.
(c) FIPS ZIP ZIP_STATUS Census_Variables.
Alternatively, one can write results in a binary format. This
"machine readable" output can then be used by other programs
(such as the GAUSS statistical package). If binary output is
selected, only matched ZIP (or FIPS) codes are written to the
output file. In contrast, for ASCII output (described above), when
no match is found an appropriate comment line is written to the
output file.
Notes:
(1) If DISTANCE is selected, its value will always directly follow
the _STATUS variable. Census variables are written in the
order they appear in the variable selection menu. For a
discussion of the _STATUS variable, see the third chapter of
this documentation.
(2) To aid in future recall, PRINTSTATS allows the user to add a
"header" to the top of the output file. This header may
contain a list of the currently selected variables.
(3) The default output is to the display screen (the user's
monitor).
Distance
As a feature, the PRINTSTATS may be used to compute a distance
from a user-selected site to the center of each ZIP code (or FIPS)
in the input list. For ZIP codes, two options are available:
(a) Use the center of the ZIP; or
(b) Use the center of the FIPS (that contains the ZIP).
See Appendix A, "Entering Sites," for instructions on entering sites. See Appendix B, "Computing Distances," for a description of how distances are computed.
TRIPDIST computes the minimum mileage (measured in road or
great circle miles) needed to complete a multiple site trip. The
user supplies a file containing a list of "origins," where an origin
can be a ZIP code or a FIPS code. In addition, the user supplies a
list of itineraries (an itinerary is simply a list of sites). TRIPDIST
then computes the minimum mileage needed to complete each
itinerary, from each origin. The order in which one enters the
stops for a given itinerary is unimportant, since TRIPDIST
computes the minimum distance required to visit each site and
(optionally) to return home (to the origin).
Each line in the output file consists of the origin identifier and the
minimum trip distance, for each itinerary. A maximum of five sites
may be entered for each itinerary. Up to 30 itineraries may be
entered. TRIPDIST finds the shortest route that connects each
site on the itinerary. In contrast, COMPDIST will compute the
length of a line that connects each site in the order they appear in
the itinerary. In other words, COMPDIST makes no attempt to
"optimize" the route, while TRIPDIST will search all possible routes
(that connect each site in the itinerary), and report the route with
the shortest mileage. Thus, the distance computed by TRIPDIST is
always less then or equal to the distance computed by
COMPDIST.
Notes:
(1) The more sites on an itinerary, the slower TRIPDIST will run.
(2) A header may be written to the output file.
(3) Exact matches only: If a ZIP can not be found, the user will
receive a response, BAD ZIP/FIPS.
(4) For a discussion of how to enter sites, see Appendix A,
"Entering Sites." For a discussion of how distances are
computed, see Appendix B, "Computing Distances."
INITIALIZE is used to initialize and review several ZIPFIP options.
The features that can be set by INITIALIZE include:
(a) INPUT and OUTPUT: Changes the default output, and input
directories to the user's personal work area.
(b) DATABASE: Selects any of the currently installed databases.
ZIPFIP comes packaged with the ZIPFIP-1990 database;
several others are available (see ordering information at the
back of this document).
(c) REVIEW: Lists the files comprising the currently selected
database.
(d) COLORS: Alters the colors used in EDITSTAT, and for
displaying the results window.
(e) ADVANCED: Displays a list of shortcuts for the frequent user
(Appendix G, "Power-User Tips," contains further discussion).
(f) MERGE: Facilitates the merging of two output files. The
merge checks for same identifier (such as a ZIP code), and
will fail if a mismatch occurs. Thus, MERGE should only be
used with ZIPFIP output that derived from a single market
area file.
(g) LOG: Selects the location to which comments and error
messages are written. These descriptive messages can be
written either to your output file (in which case, they are
bracketed by the @ characters), to a "log" file (with a default
name of ZIPFIP.LOG), or to both.
(h) COMMENT: Select the "following" and "trailing" comment
character(s). By default, these are set to the @ character.
Notes:
(1) INITIALIZE uses "initialization files" (such as ZIPFIP90.INI),
that contain information on the default input and output
directories. Careful users can modify these files directly.
(2) For information on creating "customized" databases, contact
the authors (addresses at the front of this documentation).
LIST is used to list output files on the screen. It is included as a
convenience, giving the ZIPFIP user an easy means of examining
output files.
Upon selecting a file to display, you can move up and down in this
file by using the PgUp, PgDn, Up arrow, Down arrow, Home (go to
start of file) and End (go to end of file) keys. You can also scroll
the screen sideways by using the left and right arrow keys (up to
250 characters will be read). You can enter a line number, the
block of lines containing the requested line will be displayed, with
the requested line in reverse video.
You can also search for a text string. Hit the F2 function key,
then enter the case-specific text string you want to search for. If
found, the line on which this string occurs will be displayed with
reverse video.
If the file name you request cannot be found in the current
directory, ZIPFIP will then search the default ZIPFIP output
directory. This is useful for viewing ZIPFIP output.
LIST can also be used to examine the current (temporary) contents
of both the "results buffer" and the "input history" buffer. You
can even move the current contents of these buffers to an output
file. Note that both these buffers are 20 lines long.
In several commands, the user enters one (or several) "sites,"
where a site is defined as a latitude, longitude, and a State name.
The user should enter three values (two numbers and one name)
on one line -- with each value separated by either commas or
spaces. For example:
Lat, Long, State: 39.2 81.6 WV
The State name (WV, in this example) may be either a two-letter
abbreviation or the complete name.
For most commands, a default location may be specified by
entering the appropriate two-letter sitename. The list of default
sites is contained in the file INISITES.DEF. If the user wishes to
customize the list of defaults, s/he may edit INISITES.DEF (in the
\DATA subdirectory) with her/his favorite text editor.
Notes:
(1) As a convenience, you can select the latitude, longitude and
State code of a particular ZIP code or FIPS code as your site.
To do this, simply enter:
ZIP=xxxxx or FIPS=xxxxx, where xxxxx is the ZIP or FIPS
code (the ZIP=xxxxx option is available only if the
selected database contains ZIP code information.
Examples:
ZIP=48105 This code (for Ann Arbor, MI) yields a latitude of
42.3, a longitude of 83.8 and State code of 26.
FIPS=23005 This code for (Cumberland, ME) yields a latitude
of 43.9, a longitude of 83.7, and State code of
23.
(2) If computation of road distance is desired, but you do not
provide a State name (for example Lat, Long, State: 39.2
81.6), a great circle distance will be computed.
(3) For a discussion of the methodology used to compute
distances, see Appendix B, "Computing Distances."
(4) Since ZIPFIP is designed for use in the United States, the absolute value of latitudes and longitudes are used.
Two kinds of distances are produced by the ZIPFIP package: the
Great Circle Distance and the Road Distance.
Great Circle Distance: The Great Circle Distance between a pair
of points is simply the direct distance between the two points,
measured over the surface of a curved earth. It is an "as the
crow files" distance. All that is required is the longitude and
latitude of the two points.
Road Distance: The Road Distance developed in the ZIPFIP
package is technically a Great Circle Distance that has been
corrected for route circuity. The idea is that since distant points
are typically not connected by straight line roads, the traveler
must follow a more or less circuitous route to get from point A
to point B. The circuity factor is, therefore, a correction factor
that is applied to the Great Circle Distance in order to
approximate the road distance.
The circuity factor will differ for every conceivable pair of
locations. Lacking such complete information, ZIPFIP uses State-to-State averages. Thus, a single circuity factor is used for all
trips from any location in State A to any location in State B.
Finally, for each State there is a "intra-State" circuity factor.
Assuming that one has the longitude, latitude and State for a pair
of locations, the computation of road distance is a three-step
process:
(a) Compute the Great Circle Distance between the two
locations.
(b) Look up the State-to-State circuity factor for this pair.
(c) Multiply the Great Circle Distance by the circuity factor.
Although somewhat naive, the results of this procedure are
surprisingly accurate. Comparison of distances computed using
ZIPFIP to those published in road atlases typically differ only by a
few percentage points, especially for longer trips. Problems do
arise in shorter trips, especially in mountainous regions, where
specific journeys might necessitate a very indirect route.
Note: The average circuity factor is 1.15.
Several commands (EDITSTAT and PRINTSTATS) ask the user to
enter a variable name. When entered using text string, ZIPFIP will
recognize any unique portion (substring) of a variable name. Thus,
a variable with the name "% WHITE" may be requested by "%
WHITE" (the exact name), "% W", and so forth. Note that
"%W" will not work since the space after the "%" has been
omitted.
ZIPFIP will first attempt to exactly match the string you entered to
one of the variable names. If no exact match is found, then a
"substring match" is attempted. In some cases, two variables will
have nearly the same name (for example, "PERCAP2" and
"PERCAPX"). The string, PERCAP, matches either name. In this
situation, the first PERCAP encountered (PERCAP2) will be the one
selected. To be sure of selecting PERCAPX, enter PERCAPX (the
exact name).
Another important use of variable names is to match ZIP and FIPS
variables for purposes of missing values replacement. The
inexperienced user need not worry about this, assuming that the
package has been correctly installed. The following discussion is
meant for the curious or adventurous.
The trick used for missing value replacement is to search the FIPS
database for a variable with the exact same name as the variable
afflicted with a missing value. If such a variable exists, then the
value of this variable is used instead of the missing value, such
value being drawn from the FIPS to which the desired ZIP belongs.
Example: FIPS Name ZIP Name Result
%MALE %MALE Match (FIPS value may be used)
LAND AREA SQ MI No Match
POPUL POPUL Inappropriate Match
The third example ( POPUL) is important. Replacing the value of %MALE from the FIPS, when ZIP is missing, is probably justifiable. In contrast, replacing POPULATION (from the FIPS) for a POPULATION missing from a ZIP is always incorrect. To avoid this, ZIPFIP will prompt you before substituting for POPULATION variables.
ZIPFIP often asks the user to supply data; such as a list of ZIP (or
FIPS) codes. Sometimes this may be done from the keyboard,
sometimes through ascii data files, and sometimes the user is
given an explicit choice.
When using input files, each "observation" must be on one, and
only one line of the ascii data file. Variables should be separated
by spaces or commas. Values may be in integer or real format
(with or without decimal point). Often, the user will be asked to
provide the "nth variable" (referring to the location in the line of
data) containing the ZIP code, FIPS code, or other identifying
information. A datafile containing a list of FIPS codes is expected,
output consists of either a list of FIPS code, or a list of ZIP code
(and FIPS code).
Example: 46992 1 3 5 11.50
46996 1.2 2 6 20.00
In the order they are entered, these numbers represent the ZIP
code plus four items of data, such as numbers of groups, trips,
days, or dollars. Thus, for the second line of the example:
nth=1 -- indicates the value of the first item (for example,
the ZIP code), 46996;
nth=2 -- indicates the value of the second item, 1.2;
nth=3 -- indicates the value of the third item, 2; and so
forth.
A convenient method of entering a variable number is to use the
ALT-V option. When asked to enter a variable number, strike ALT-V. The first several lines of the file will be displayed. You can
then use the up and down arrow keys to select the desired
"variable location." You can also use the PgUp, PgDn, CTL-left-arrow, and CTL-right-arrow keys to move within the file (say, to
examine a block of text starting at the 50th row and 100th
column). In addition to the ALT-V key, you can also use the ALT-L
key to list the file (ALT-L is better for examining larger files, but is
not optimized for variable selection).
Output files are created by most ZIPFIP actions. When you
provide a name for an output file, if you do not specify a directory,
it will be created in the default output subdirectory; typically
/ZIPFIP/OUTPUT. A default output file is usually available (just hit
the ENTER key). If a file with the requested name exists, you are
given the option of overwriting the prior file, appending to it, or
trying a different name.
Notes:
(1) When asked to provide an input or output file but you want to
enter the data form the keyboard, enter CON (short for
Console).
Careful: for input files, you might have to hit Ctrl-Z to signal
end of input; for output files, you might get an unreadable
stream of data.
(2) The "@" or "*" character, when placed at the beginning of a line in an input file, acts as a "comment" label. In other words, lines in an input file whose first character is an "@" or a "*" are ignored by ZIPFIP.
If the package has been set up correctly, and the user is running
ZIPFIP from the correct directory (such as \ZIPFIP), this appendix
should not be needed. However, if the user wants to save disk
space, or has plans for personalized optimization, then s/he should
read on.
Each ZIPFIP database is constructed from a variety of files.
Depending on one's uses of ZIPFIP, a number of these files can be
removed. For example, in the ZIPFIP-1990 database, the
consequences of removing the following files are:
File Consequences of Removal
CIRCUITY.UNF Great Circle Distances may be computed, but
exact road distance circuity factors will not be
available: a value of 1.15 will be used.
ZIP5INDX.90 Searches of the ZIP location and name databases
will be slow.
ZIP3INDX.90 ZIPFIP will (slowly) create an index at start up.
ZIP5NAME.90 FINDNAME will not be able to find town names.
FIPTOZIP.90 MARKET will not work with ZIP codes.
FINDNAME may not work properly.
FIPSLOC.90 Most of the FIPS code oriented commands will
not work.
FIPSTAT.90 PRINTSTATS and EDITSTAT will not edit FIPS
data.
FIPSNAME.90 FINDNAME will be unable to find county names.
ZIP5LOC.90 Most of the ZIP code oriented commands will not
work.
ZIP5STAT.90 PRINTSTATS and EDITSTAT cannot access ZIP
census data.
FIPSCALE.90 PRINTSTATS will be unable to "scale" variables.
Several of these file are large, especially ZIP5NAME.90 and ZIP5STAT.90. Please note the consequences listed above if the user is forced to remove these large files their machine.
Scaling has two purposes: (1) to produce values for years other
than the base year; and (2) to correct for missing ZIP codes. In
ZIPFIP, two kinds of scales exist: User scales and County-specific
scales.
County-Specific Scales
A scale is a fraction that is multiplied by some "absolute value"
(perhaps a percentage) in the census database in order to produce
some new value. The form of most county-specific scales is:
SCALE=NEW_VALUE / OLD_VALUE.
OldValue is usually a value corresponding to the base year, while
NewValue is the value for some nonbase year. For example, if the
most recent census was taken in 1990, then OldValue would be
taken from the 1990 "census" database, and NewValue could be a
value from 1991. The net effect is that when SCALE is multiplied
by the appropriate OldValue, the NewValue will be produced. One
may ask why ZIPFIP does not just save the NewValue in the FIPS
census database; the answer lies in the conservation of disk
space, the ease of adding new scales and, most importantly, the
scaling of ZIP code variables (for a discussion of scaling ZIP code
variables, see the discussion of Scaling in the PRINTSTATS
command documentation).
Correcting for Missing ZIP Codes
A special County-Specific scale, known as POPFIX, exists for
precisely one purpose: to correct for missing ZIP codes. It
guarantees that the sum of "scaled" populations from all ZIP codes
in a FIPS will equal the population in the FIPS. In other words,
POPFIX is meant to be applied only to ZIP data.
A complete description of Missing ZIP codes, and the use of
scaling as a correction for missing ZIP codes, may be found in the
third chapter of this documentation: Problems With ZIP Code
Data.
User Scales
User scales are county-invariant scales, with values entered at run
time. For each User scale selected, one must specify a single
value. This single value will be applied to the selected variable, for
all observations, regardless of FIPS code. Though not as flexible
as the COUNTY-SPECIFIC scales, this feature does allow the user
to enter some national corrections.
In particular, one may enter national deflators for the income
variables. As a convenience to the user, a Consumer Price Index
(CPI) deflator is hardwired into PRINTSTATS. It is easy to extract
a value of the CPI to use as a USER scale; see PRINTSTATS on-line help for details.
Up to 25 User scales may be active at one time.
ZIPFIP contains a number of special features that can substantially
ease its use. Many of these features involve special keystrokes
that move within a command string, and move strings to and from
different "buffers."
There are two buffers: the results buffer and the input-history
buffer. The results buffer contains a temporary copy of recent
"results," that is, recent output of ZIPFIP. Since this buffer is only
20 lines long, ZIPFIP uses some judgment as to what to write to
the output buffer. Basically, long lists (such as market areas) are
summarized, while single answers (such as locations of ZIP codes)
are written completely to the buffer. The input history buffer
contains a "history" of the last 20 command strings entered into
ZIPFIP. Note that only command strings are retained; options
selected from a menu using cursor keys (or the mouse) are not
retained.
Another general feature of ZIPFIP is the ability to simultaneously
select a menu option and specify an argument to this option. For
example, input files are often listed as items on a menu. The
standard procedures of selection are to highlight the selection
(using the cursor keys or the mouse), hit the ENTER key, and then
supply the name of the file. This sequence may be shortened:
after highlighting the selection, enter the file name before hitting
the ENTER key. This short cut is available for most, but not all,
menu items.
The following lists important keystrokes that are used to
manipulate buffers, and speed up menu entries:
Key Stroke Description
CTRL-E Temporary exit to DOS.
ESC Erase current line. However, if there are NO
characters entered, ESC is interpreted as a "cancel."
INS Toggle insert mode. A flashing block signals insert
mode on.
. and . Move left and right in the string.
HOME Move to beginning of string, or to first option in a
menu.
END Move to end of string, or to last option in a menu.
F3 and F4 Move up and down in the input history buffer.
F1 Context Sensitive Help
ALT-D Scroll "results window" DOWN one line.
ALT-U Scroll "results window" UP one line.
ALT-G Get top line of results window.
ALT-P Put current text into results window.
ALT-R Refresh screen. Useful after a Ctrl-E exit to DOS.
ALT-ENTER Accept string as input to current menu option. This
overrides "sub-string match"; the currently
highlighted option is chosen with the string used as
input.
ALT-F When ZIPFIP asks for an input file, hitting ALT-F will
cause ZIPFIP to list the files in the default input
directory. You can then select a file from this list,
or change directories, using cursor keys.
ALT-V When ZIPFIP asks for the "nth variable" (in a line of
an input file), hit ALT-V to display the currently
selected input file.
ALT-L When ZIPFIP asks for the "nth variable" (in a line of
an input file), hitting ALT-L allows the user to list a
file (similar to the LIST command). In addition,
LIST will attempt to "parse" each line of the input
file, highlighting each separate variable.
Note that the input-history, and command line editing, features of
ZIPFIP are similar to the DOSKEY program of DOS 5.0.
This quiz uses the BWOBS.SMP file that is stored in your \OUTPUT
subdirectory. BWOBS.SMP consists of observations of permits
issued to visitor parties at the Boundary Waters Canoe Area. Each
observation contains information on the number of visitors per
party and the ZIP codes of each party.
Quiz
You have several tasks to accomplish:
(1) Find the location of four U.S. cities:
Chicago, IL
Ely, MN
Fargo, ND
Minneapolis, MN
TIP: Write the latitude, longitude, ZIP and FIPS codes on paper for
future reference.
(2) Compute the distance from Ely, MN to Chicago, IL, from Ely,
MN to Fargo, ND, and from Ely, MN to Minneapolis, MN.
TIP: Since you need to compute only two distances for this
exercise, do this by entering the input information from the
keyboard. You may, however, specify files of starting and
ending locations if you have a lot of information to handle.
(3) Compile a list of counties within 500 miles of Ely, MN.
TIP: Store output in an Output file, for example ELYMKT.FIP.
(4) For each county on this list, create a data file of the following
from the 1990 U.S. Census: population, per capita income,
percentage of population with a college degree.
TIP: Use the file you created in (3) as an Input file.
Store output in an ASCII file, for example ELYMKT.CEN.
(5) For each county on the list, compute the minimum round-trip
mileage for a trip on which you visit each of several cities.
Do this for the following sets of cities:
a. Chicago, Ely, and Fargo.
b. Ely and Minneapolis.
c. Chicago, Ely, and Minneapolis.
d. Chicago, Ely, Fargo, and Minneapolis.
TIP: Use the ASCII file created in (4) as a Market file.
Store output in a file, for example ELYITIN.DIS.
(6) For each county on the list, compute the number of BWCA
permits issued to visiting parties, and compute the number of
individuals who visited the BWCA.
TIP: Use the file you created in (3), and BWOBS.SMP, as input
files.
Possible answers to Quiz
Exercise 1: To find the location of four U.S. cities:
Chicago, IL
Ely, MN
Fargo, ND
MinneapolIs, MN
a. Get into ZIPFIP from the DOS prompt. You must first change
directories until you are in the ZIPFIP directory.
b. From the ZIPFIP Main Menu, select FINDNAME.
c. Select " NAME <enter>"; type CHICAGO <enter>.
d. Select " STATE <enter>"; type IL <enter>.
e. Select " ALL <enter>"; type NO <enter>.
f. Select " RUN <enter>".
NOTE: ZIPFIP will give you the location for Chicago, IL. Your
monitor should read:
#matches= 65, (first zip,fip @ 60600 17031
We suggest you record the (single) ZIP code returned by
ZIPFIP on a piece of paper for future reference.
g. Exit to the ZIPFIP Main Menu and select FINDZFIP.; Then
select ZIP LOC, and enter this (single) zip code for Chicago.
You should receive latitude, longitude and FIPS code for
Chicago:
ZIP, FIP = 60600 17031; LAT,LONG = 41.850 87.650
h. Now repeat steps c through g for Ely, Fargo, and Minneapolis.
You should have recorded the following results:
Chicago: ZIP,FIP=60600 17031; Lat,Long = 41.850 87.650
Ely: ZIP,FIP = 55731 27137; Lat,Long = 47.903 91.867
Fargo: ZIP,FIP=58102 38017;Lat,Long=46.877 96.789
Minneapolis: ZIP,FIP=55400 27053;Lat,Long=44.98 93.26
Exercise 2: To compute the distance from Ely, MN to
Chicago, IL, and from Ely to Fargo, ND.
a. From the ZIPFIP Main Menu, select COMPDIST.
b. Select START-LOC, then hit <enter> to select "default =
Input from keyboard."
Then, select ZIP as the "type."
Note: You need to compute only two distances for this
exercise, so you may do this by entering the input information
from the keyboard. You may, however, specify files of
starting and ending locations if you have a lot of information
to handle.
d. Select OUTPUT to specify a name for your output file, such
as DISTANCE.OUT. This file will store all the information
you generate within COMPDIST.
e. Select RUN. COMPDIST will ask for a list of ZIP codes (the
distance between these zip codes will be computed).
First, enter the ZIP code for Ely, MN, and then the ZIP-code
for Chicago, IL. Hit <esc> to indicate the end of this
itinerary.
f. Without exiting from COMPDIST, repeat step e for Ely to
Fargo and Ely to Minneapolis.
To view the results, which have been stored in the output file
DISTANCE.OUT, go back to the ZIPFIP Main Menu and select
LIST. When specifying DISTANCE.OUT, this file should read:
2 572.66
2 269.39
2 242.05
Exercise 3: To compile a list of counties within 500
miles of Ely, MN.
a. From the ZIPFIP Main Menu, select MARKET.
b. From the MARKET OPTIONS menu, select CENTER(S);
then select "default = Input from keyboard," and enter
the latitude, longitude and State (MN) for Ely.
c. Select RANGE; type 0.0 <enter> for the minimum range;
type 500 <enter> for the maximum range.
d. Select OUTPUT; type ELYMKT.FIP <enter>.
e. To process these commands, select RUN.
The first few lines of ELYMKT.FIP should read:
@ Center of FIPS Market area: 47.903 91.867 27 @
17085
17177
17201
19005
19011
Exercise 4: To create a data file, for each county on
this list, of the following from census
variables: 1985 population, 1985 per
capita income, and 1990 percentage of
population with a college degree.
a. From the ZIPFIP Main Menu, select PRINTSTATS.
b. From PRINTSTATS, select INPUT; type ELYMKT.FIP
<enter>, and for location, 1 <enter>.
c. Select OUTPUT, then ASCII for ascii output; then type
ELYMKT.CEN <enter>.
d. Select VARIABLES to choose from available FIPS-level
variables. From this list, highlight POPULATION, PERCAP,
and %COL DEGRE.
f. Select SCALE; then select YEAR. Type 85 <enter> to
select 1985 as your base year.
g.Select HEADER, <enter>, and YES.
g. Select GO to extract statistics.
The first few lines of ELYMKT.CEN should read:
@ Base_year = 1990 , and Current Year = 85 @
@ For Year: 85, data available for: @
@ POPULATION @
@ PERCAP @
@ For Year: 1990, data available for: @
@ % COL DGRE @
@ Order of variables is ... @
@ Generated variables @
@ FIPS (found), FIP STATUS, @
@ User selected variables: @
@ POPULATION, PERCAP , % COL DGRE, , , , @
@ NOTE: Missing values will be displayed as . @
17085. 0. 21340. 13271. .040
17177. 0. 49013. 13550. .055
17201. 0. 239255. 14385. .060
19005. 0. 13023. 11296. .030
19011. 0. 23393. 11623. .035
Exercise 5: For each county on the list, compute the
minimum round-trip mileage for a trip on
which you visit each of several cities. Do
this for the following sets of cities:
Chicago, Ely, and Fargo.
Ely and Minneapolis.
Chicago, Ely, and Minneapolis.
a. Find FIPS codes for Fargo ND, Chicago IL, and Minneapolis,
MN, using the FINDNAMES option (see Exercise 1).
b. From ZIPFIP Main Menu, select TRIPDIST.
c. Select MARKET; type ELYMKT.FIP <enter>.
d. Select OUTPUT; type ELYITIN.DIS <enter>.
e. Select DISTANCE; select ROAD <enter>, then ROUND
<enter>.
f. Select ITINERARY; enter all four itineraries, one at a time, by
ZIP or FIPS codes for each destination point.
For Ely, Fargo, and Chicago: ZIP=55731 <enter>,
ZIP=58102, <enter>, ZIP=60600 <enter-esc-esc>. Note
that the order of entry is not important, since the minimum
distance connecting these cities will be computed.
h. Select RUN. You can view your results by going back to
ZIPFIP Main Menu and selecting LIST.
The first few lines of ELYITIN.DIS should read:
@ Line ignored: @ Center of FIPS Market area: 47.903 91.867 27
17085 1523 1012 1252
17177 1518 1045 1246
17201 1514 1078 1243
Exercise 6: For each county on the list, compute the number of
BWCA permits issued to "visiting parties", and
compute the number of individuals who visited the
BWCA.
a. Select AGGREGATE from the ZIPFIP Main Menu.
b. Select OBSERV;type BWOBS.SMP <enter>, then 2.
c. Select MARKET; type ELYMKT.FIP <enter>, then 1.
d. Select OUTPUT; type ELYAGG.OUT.
e. Select AGG VARS; then select COUNT, then select SUM, then
3, then select DONE.
f. Run AGGREGATE.
The first few lines of ELYAGG.OUT should read:
@ Aggregating from ZIP to FIPS @
@ Line 932 No FIPS found for Zip= 0 @
@ Line 973 No FIPS found for Zip= 99999 @
17085 .0000 .0000
17177 .0000 .0000
17201 2.0000 17.0000
Changes in the geographical areas served by a ZIP code are
separate from actual population changes. Thus, even if a region's
population remained fixed, the ZIP codes themselves could
change; new ones may be added, some of the old ones may be
dropped, and portions of towns may be transferred from one ZIP
to another. Population changes and boundary changes will be
correlated; the post office would have little reason to draw its
maps differently.
The proper way to deal with this is to map census tracts (or block
statistics) into ZIP codes. With counts from these tracts, and
knowledge of the geographic boundaries of tracts and ZIP codes, it
is conceptually easy to see how one could determine population
counts for a ZIP code -- just add all tracts that fall within a ZIP
code. This sort of methodology is adopted by various marketing
firms.
Again, the change in ZIP code boundaries is not necessarily related
to actual changes in population. Thus, the population in a given
ZIP code, if an actual count were made in different years, could
change even if there were no change in the on-the-ground
demographics -- no births, no deaths, no moves, and so forth.
This demographic stability will never exactly hold, and in certain
areas will be quite incorrect. Therefore, even if ZIP code
boundaries were fixed, there will be changes in demographic
statistics over time. Since complete census enumerations occur
on a decennial basis, demographic measure at the ZIP code level,
in mid-decade years, must be estimated with the aid of ancillary
data. Examples include:
(a) county-level annual population estimates;
(b) county-level per capita income estimates; and
(c) yearly counts of "deliverable" addresses, available by ZIP
code from the post office. These are a proxy for number
of households per ZIP code.
In summary, given a ZIP code in a nondecennial year, the census
estimates will be incorrect to the extent that the ZIP code's
boundaries have changed, and to the extent that the actual
demographics of the region have changed.
As an example of the type of problem arising when ZIP code data
is used, consider the ZIPFIP80 database; where numerous ZIP
codes were created after the 1980 ZIP code census database was
created. For these ZIPs, there is no entry in the census database.
Such new ZIPS are likely to be found in rapidly growing regions,
such as Florida. Therefore, measures by ZIP code in such regions,
taken after the decennial census, will face two problems:
accounting for actual changes in the population, and correcting for
changes in ZIP code boundaries, with the extreme case being the
creation of a new ZIP code.
ZIP Data Sources and the Use of "Closest" ZIP
For the ZIPFIP-1980 database, the raw data for ZIP codes were
extracted from two sources:
(a) A location file, circa 1986, containing all residential ZIP
codes, with FIPS code, town name, and longitude and
latitude. This file was used to create ZIP5LOC.UNF and
ZIP5NAME.UNF.
(b) A set of census tapes (STF3B), circa 1980, containing
numerous census variables on a ZIP code level, from
which approximately 40 variables were extracted. These
variables were then compressed and stored in
ZIP5STAT.80.
There is not a perfect correspondence between the ZIP codes on the census tapes and in the location file. For example, there are approximately 1,000 ZIP codes read from the 1980 census tapes that were not in the location file (which is from 1986). Conversely, approximately 3,500 ZIP codes in the location files were not in the census tapes. The ZIP codes missing from the location file were probably dropped between 1980 and 1986. Conversely, those missing from the census tapes were most likely
added between 1980 and 1986.
A question that may arise is:
What does one do when a ZIP code cannot be found in the
database being searched, be it the location or the census
databases?
ZIPFIP resolves this problem by finding and using the value that is
numerically the closest to the missing ZIP code. For example:
IF 7036 exists,
and if 7037 does not exist,
and if 7038 does not exist
THEN if the user requests 7037, and this "close" option is
selected, 7036 will be returned;
BUT if the close option is not selected, then there will be
NO MATCH.
The notion is that, in whichever database is being searched, the
Nation is completely divided by ZIP codes. Therefore, a ZIP code
that is not in the database represents the results of a redrawing of
ZIP code boundary lines. Since geographically contiguous ZIP
codes have values that are numerically close, it is likely that the
resident of this missing ZIP code lives inside the boundaries of the
ZIP code (in the target database) that is also numerically close.
For example:
IF 07891 and 07894 divide the town of FOOBAR,MI;
circa 1986,
BUT, 07891, 07892 and 07894 divide the town in 1988,
THEN 07892 contains portions of the old 07891 and 07894.
Assuming that a residence in 07892 of the 1988 database was
formerly assigned to 07891 in the 1986 database, then 07891
will frequently be correct. Note that 07891, which is closer to
07892 then 07894 (in numeric value), will be selected.
Note: The U.S. Postal Service tries to keep a new ZIP code
number near the numbers of the old ZIP codes from which
it was created. This, however, is not always possible.
Three-digit ZIP codes are, in essence, well-defined geographical areas. Thus, all searches for closest ZIP codes will occur in the
same three-digit ZIP code as the requested ZIP code. For example,
to find ZIP code 17891, the range from 17800 to 17899 will be
searched. If no match is found in that range (if that three-digit ZIP
does not exist), then there will be a message, "NO MATCH,"
returned.
The use of this "closest" ZIP code feature is fairly straightforward
in most of the commands, except for PRINTSTATS. Since
PRINTSTATS uses both the locational and census databases,
several types of searches are possible. These are:
EXACT
Both the location and census databases are checked for the
requested ZIP code. Each database must contain the ZIP
code; otherwise, no match is found.
APPROXIMATE (Close ZIP w/census data)
If an exact match cannot be found, a ZIP code that has a
numerically close value will be returned (provided that both
census and location data are available). Note that the search
is limited to the desired ZIP codes three digit ZIP code.
Furthermore, only ZIP codes from the same county as the
"closest" location ZIP code are allowed (note that census data
must exist for the ZIP code that is used).
SEPARATE & APPROXIMATE
Both databases are searched for closest ZIP code. The
(numerically) closest ZIP in the location database is used for
locational information. The (numerically) closest ZIP in the
census database is used for census information. It is,
therefore, possible for locational information and census
information to come from different ZIP codes. The search is
limited to the desired ZIP codes three-digit ZIP code.
SEPARATE & APPROXIMATE, WITH CHECK
Similar to the example above, but this search is limited to ZIP
codes for which some locational information is available.
Note that information from two different ZIP codes may still
be returned, but there will be location information available
for both of them. If these two ZIP codes are different, it
must be the case that there is no census information available
for the ZIP code selected from the location database, but it is
(numerically) closer to the desired ZIP than the ZIP code
selected from the census database.
This method will limit the search of census ZIP codes to one
FIPS code. Specifically, the census ZIP code must be from
the same FIPS code as the location ZIP code.
Notes:
(1) In the ZIPFIP-1980 database, the approximately 3,500 ZIP
codes with no census data (but with location data) will not be
available in the EXACT and Close ZIP w/census data options.
The only options that access the approximately 1,000 ZIP
codes with no location data is the first SEPARATE option.
(2) The ZIPFIP-1990 database contains a more thorough (and
current) list of ZIP-codes; however, the basic limitations of
ZIP-code data will still be present. Also not that, at this
writing, census information for 1990 ZIP-codes is limited.
Example:
ZIP code requested: 27021
ZIPS in locational database: 27001 27015 27018 27019
27022 27026 27030
ZIPS in census database: 27001 27015 27017 27018
27026 27030,
where 27001, 27015, 27018, 27019, 27022, 27026 and
27032 are from FIPS=1003, while 27018 is from
FIPS=1004, and 27017's FIPS code is unknown.
Option Result
EXACT: No match.
Close ZIP w/census data 27026 is returned.
SEPARATE Search For the location database, 27022 will
be returned (latitude and longitude
from 27022 will be used). For the
census database, 27018 will be
returned (census data from 27018 will
be used).
SEPARATE Search, For the location database, 27022 will
with check be returned (latitude and longitude
from 27022 will be used). For the
census database, 27026 will be
returned (census data from 27026 will
be used).
The STATUS Variable
A STATUS variable is always included in the output of
PRINTSTATS. It reports how successful for the ZIP (or FIPS) code
was. The status variable is a two-digit number (location, census).
The first digit reports the results of the location search, and the
second digit the results of the census search.
For example, with the number 12, the first digit, 1, refers to the
location, and the second digit, 2, refers to the census result.
Each digit may take one of five values:
Value Definition
0 Exact match
1 Close match
2 No match, but no search (for closest ZIP or FIPS)
3 No match, EVEN though a search was attempted
4 No attempt at matching, not even looking for an
exact match
Increasing values represent worse results, with the difference
between 2 and 3 being one of effort. A value of 00 means exact
match in location and census database, whereas a value of 33
means no match, in either database. The value, 4, signals to the
user that the location search failed to yield a match, resulting in
the census database being ignored.
Notes:
(1) If there is an exact match in location database, the
implied 0 will not be printed. For example, an exact
location, with close data, will be displayed as "1", not as
"01". Note that values of less than 10 mean that
location match was exact.
(2) Possible values of the STATUS variable will vary according to
the type of search requested.
A Word to the Wise
Each entry in the "input file" may be associated with as many as
three ZIP codes. These include:
(1) The Location ZIP code. In the PRINTSTATS command, this is
always the first variable in the output file.
(2) The Census ZIP code. In PRINTSTATS, this will only be
output if the user requests it as one of her/his census
variables.
(3) The requested ZIP code from the input list. It is not written
to the output file. However, a one-to-one correspondence is
maintained between lines in the input and output files.
A census ZIP code not matching a location ZIP code will occur
only when one of the two SEPARATE options is requested. If the
returned STATUS equals 0, these three ZIPS will have the same
value.
Adjusting for Missing Observations through Scaling
One technique to account for the entire population is to scale or
weight ZIP codes by a correction factor. ZIPFIP uses this
technique to provide corrected measures of ZIP code population.
Information at grosser levels of aggregation (counties) is used to
adjust finer aggregates (ZIP codes) so that the finer aggregates
"add up" to the quantity known to exist at the grosser level.
Specifically, given accurate measures of FIPS populations, a scale
may be applied to all ZIPS in a FIPS, so that the sum of scale
populations from all the ZIPS in a FIPS will equal the FIPS'
population.
Applying this technique to "missing observations" implies a division of the nation into "quasi-ZIP" codes; where each quasi-ZIP code represents a ZIP code for which both Census and Location information is available (about 35,000 units in the 1980
database). The user then maps each and every individual into
these (35,000) quasi-ZIP codes.
By scaling the population in the appropriate quasi-ZIP code (as
extracted from the census tapes), we may get a close estimate of
the population that would result had we performed this "quasi-ZIP
code mapping" to the entire American population. At the very
least, some consistency is maintained between quasi-ZIP code and
FIPS aggregation.
The Close ZIP w/Census Data option was constructed with this
idea in mind. Specifically, for population counts from ZIP codes,
this option should be used in conjunction with the POPFIX scale.
It should be noted that, in the ZIPFIP package, scaling has a
variety of uses other than this missing observation correction. For
example, mid-decade county estimates of population are used to
scale mid-decade ZIP code populations. The same holds for other
variables, such as unemployment, where one only needs some
fraction to be applied to all ZIPs in a FIPS, with each FIPS having
its own value. See Appendix F on "Scaling" in the first chapter for
further discussion of scaling.
In summary, there are three options for dealing with missing
observations, none of which provide a perfect solution:
Ignore the problem: use EXACT ZIP;
Scale: use Close ZIP w/Census Data with the POPFIX scale; or
Search separately: search the Location and Census databases
separately.
A Final Note On Searching For FIPS Codes
Since county boundaries rarely change, the problem of matching
FIPS from observations to FIPS in the database should not
frequently arise. For those special cases, however, exact analogies
of the EXACT and CLOSE ZIP options are available to match FIPS
codes.
Note: In the ZIPFIP-1980 database, counties with possible problems are located in Virgina, Alaska, Arizona, and New Mexico. These problems are partially corrected in the ZIPFIP-1990 database.