Geocoding school and student’s home addresses: Zandbergen responds (original) (raw)

Geocoding Addresses from a Large Population-based Study: Lessons Learned

Epidemiology, 2003

Background: Geographic information systems (GIS) and spatial statistics are useful for exploring the relation between geographic location and health. The ultimate usefulness of GIS depends on both completeness and accuracy of geocoding (the process of assigning study participants' residences latitude/longitude coordinates that closely approximate their true locations, also known as address matching). The goal of this project was to develop an iterative geocoding process that would achieve a high match rate in a large population-based health study. Methods: Data were from a study conducted in Wisconsin using mailing addresses of participants who were interviewed by telephone from 1988 to 1995. We standardized the addresses according to US Postal Service guidelines, used desktop GIS geocoding software and two versions of the Topologically Integrated Geographic Encoding and Referencing street maps, accessed Internet mapping engines for problematic addresses, and recontacted a small number of study participants' households. We also tabulated the project's cost, time commitment, software requirements, and brief notes for each step and their alternatives. Results: Of the 14,804 participants, 97% were ultimately assigned latitude/longitude coordinates corresponding to their respective residences. The remaining 3% were geocoded to their zip code centroid. Conclusion: The multiple methods described in this work provide practical information for investigators who are considering the use of GIS in their population health research.

Improving Geocoding Practices: Evaluation of Geocoding Tools

Journal of Medical Systems, 2004

This study examined the sources of error involved in geocoding, by systematically evaluating the strengths and weaknesses of three widely used tools for geocoding. We tested them against a random sample of addresses from a state administrative address master file and found considerable variation in identification of census block geocodes of addresses. This high variation was mainly attributable to differences in preprocessing of addresses before geocoding and the reference street data used for geocoding. Preprocessing includes not only parsing and standardizing, but also correcting addresses against the US Postal Service Zip+4 Database, the master mailing address database maintained and updated regularly by USPS.

Assessing the Certainty of Locations Produced by an Address Geocoding System

Geoinformatica, 2007

Addresses are the most common georeferencing resource people use to communicate to others a location within a city. Urban GIS applications that receive data directly from citizens, or from legacy information systems, need to be able to quickly and efficiently obtain a spatial location from addresses. In this paper we understand addresses in a broader perspective, in which not only the conventional elements of postal addresses are considered, but other kinds of direct or indirect references to places, such as building names, postal codes, or telephone area codes, which are also valuable as locators to urban places. This broader view on addresses allows us to work with two perspectives. First, in the ontological definition, modeling, and implementation of an addressing database that is flexible enough to accommodate the variety of concepts and address formats used worldwide, along with direct and indirect references to places. Second, in the definition of an indicator that is able to quantify the degree of certainty that could be reached when a user-given, semi-structured address is geocoded into a spatial position, as a function of the type and completeness of the available addressing data and of the geocoding method that has been employed. This indicator, which we call Geocoding Certainty Indicator (GCI), can be used as a threshold, beyond which the geocoded event should be left out of any statistical analysis, or as a weight that allows spatial analysis methods to reduce the influence of events that have been less reliably located. In order to support geocoding activities and the determination of the GCI, we propose a conceptual schema for addressing databases. The schema is flexible enough to accommodate a variety of addressing systems, at various levels of detail, and in different countries. Our intention is to depart from the usual geocoding strategy employed in commercial GIS products, which is usually limited to the average American or British address format. The schema also extends the notion of postal address to something broader, including popular names for places, building names, reference places, and other concepts. This approach extends Simpson’s and Yu’s Comput. Environ. Urban Syst., 27: 283–307, 2003 work on postal codes to records of any kind, including place names and loosely formatted addresses.

From text to geographic coordinates: The current state of geocoding

2007

Abstract: This article presents a survey of the state of the art in geocoding practices through a cross-disciplinary historical review of existing literature. We explore the evolving concept of geocoding and the fundamental components of the process. Frequently encountered sources of error and uncertainty are discussed as well as existing measures used to quantify them. An examination of common pitfalls and persistent challenges in the geocoding process is presented, and the traditional methods for overcoming them are described.

A comparison of address point, parcel and street geocoding techniques ARTICLE in COMPUTERS ENVIRONMENT AND URBAN SYSTEMS · MAY 2008 A comparison of address point, parcel and street geocoding techniques

The widespread availability of powerful geocoding tools in commercial GIS software and the interest in spatial analysis at the individual level have made address geocoding a widely employed technique in many different fields. The most commonly used approach to geocoding employs a street network data model, in which addresses are placed along a street segment based on a linear interpolation of the location of the street number within an address range. Several alternatives have emerged, including the use of address points and parcels, but these have not received widespread attention in the literature. This paper reviews the foundation of geocoding and presents a framework for evaluating geocoding quality based on completeness, positional accuracy and repeatability. Geocoding quality was compared using three address data models: address points, parcels and street networks. The empirical evaluation employed a variety of different address databases for three different Counties in Florida. Results indicate that address point geocoding produces geocoding match rates similar to those observed for street network geocoding. Parcel geocoding generally produces much lower match rates, in particular for commercial and multi-family residential addresses. Variability in geocoding match rates between address databases and between geographic areas is substantial, reinforcing the need to strengthen the development of standards for address reference data and improved address data entry validation procedures.

Accuracy and Repeatability of Commercial Geocoding

American Journal of Epidemiology, 2004

The authors estimated accuracy and repeatability of commercial geocoding to guide vendor selection in the Life Course Socioeconomic Status, Social Context and Cardiovascular Disease study (2001)(2002). They submitted 1,032 participant addresses (97% in Maryland, Minnesota, Mississippi, or North Carolina) to vendor A twice over 9 months and measured repeatability as agreement between levels of address matching, discordance (%) between statistical tabulation areas, and median distance (d, in meters) and bearing (θ, in degrees) between coordinates assigned on each occasion (H o :Σ i = 1 → n [θ i /n] = 180°). They also submitted 75 addresses of nearby air pollution monitors (77% urban/suburban; 69% residential/commercial) to vendors A and B and then measured accuracy by comparing vendor-and US Environmental Protection Agency (EPA)-assigned geocodes using the above measures. Repeatability of geocodes assigned by vendor A was high (kappa = 0.90; census block group discordance = 5%; d < 1 m; θ = 177°). The match rate for EPA monitor addresses was higher for vendor B versus A (88% vs. 76%), but discordance at census block group, tract, and county levels also was, respectively, 1.4-, 1.9-, and 5.0-fold higher for vendor B. Moreover, coordinates assigned by vendor B were further from those assigned by the EPA (d = 212 m vs. 149 m; θ = 131° vs. 171°). These findings suggest that match rates, repeatability, and accuracy should be used to guide vendor selection.

Geocoding to Create Survey Frames

Survey Practice, 2012

Geocoding to Create Survey Frames With the Delivery Sequence File (DSF), from the United States Post Office, surveys can cheaply and easily create address frames and samples. Many studies have examined the coverage of these frames (see for example Dohrmann et al. 2006; Iannacchione et al. 2003; O'Muircheartaigh et al. 2002). However, these studies do not discuss geocoding. Geocoding is a key step in turning the DSF into a survey frame. Survey researchers who use these frames should understand the role geocoding plays, whether they do this work themselves or buy already-geocoded frames or samples. Geocoding is necessary because there is a mismatch between the geographies on the DSF and those used in most surveys. The DSF contains only street address, city, state, zip code, and other fields related to mail delivery. Household samples, however, are often based on census geographies such as counties, tracts, and blocks. Geocoding translates the address data into census blocks. We have learned a lot about the geocoding process over the past ten years of work with the DSF. In this article, we share what we have learned. We explain what geocoding is and how it works. We also discuss what can go wrong. Geocoding is a two-step process. First an address is assigned a geographic coordinate (usually latitude and longitude). Then the coordinate is mapped to census geography. All addresses placed in tracts or blocks selected for the survey are part of the frame. 1 step 1: coordinate assignment To assign coordinates, the software compares each address to a database of street segments and house-number ranges. The database contains the location Institution: Institute for Employment Research Institution: NORC at the University of Chicago There are two common geocoding software programs: ArcGIS, from ESRI, and MapMarker Plus, from Pitney Bowes Business Insight (formerly MapInfo). This article focuses on MapMarker Plus, but the two programs work similarly.

Geocoding Quality and Implications for Spatial Analysis

Geography Compass, 2009

Many spatial analysis techniques rely on the ability to geocode individual locations based on addresses or other descriptive information. The quality of geocoding and its effect on spatial analysis have received some attention in the literature, in particular in the field of health. This article reviews the foundation of geocoding and presents a framework for evaluating geocoding quality. Errors introduced by street gecoding include incompleteness, positional error, and incorrect assignment to geographic units. A review of empirical studies suggests that these errors are neither small nor random in nature and that substantial bias may be introduced in spatial analysis that employs the results of geocoding. Several alternatives have also emerged, including the use of address points and parcels, and these are gradually becoming more widely used. Several areas for future research on geocoding have been identified: (i) refinements of address data models to incorporate complex addressing situations; (ii) development of error propagation techniques to determine the level of geocoding quality required for a particular analysis scenario; (iii) development of measures of reliability for geocoding results; (iv) comparative analysis of geocoding quality across different jurisdictions; and (v) validation of online geocoding services and volunteered geographic information.

Evaluation of the positional difference between two common geocoding methods

Geospatial health, 2011

Geocoding, the process of matching addresses to geographic coordinates, is a necessary first step when using geographical information systems (GIS) technology. However, different geocoding methodologies can result in different geographic coordinates. The objective of this study was to compare the positional (i.e. longitude/latitude) difference between two common geocoding methods, i.e. ArcGIS (Environmental System Research Institute, Redlands, CA, USA) and Batchgeo (freely available online at http://www.batchgeo.com). Address data came from the YMCA-Harvard After School Food and Fitness Project, an obesity prevention intervention involving children aged 5-11 years and their families participating in YMCAadministered, after-school programmes located in four geographically diverse metropolitan areas in the USA. Our analyses include baseline addresses (n = 748) collected from the parents of the children in the after school sites. Addresses were first geocoded to the street level and assigned longitude and latitude coordinates with ArcGIS, version 9.3, then the same addresses were geocoded with Batchgeo. For this analysis, the ArcGIS minimum match score was 80. The resulting geocodes were projected into state plane coordinates, and the difference in longitude and latitude coordinates were calculated in meters between the two methods for all data points in each of the four metropolitan areas. We also quantified the descriptions of the geocoding accuracy provided by Batchgeo with the match scores from ArcGIS. We found a 94% match rate (n = 705), 2% (n = 18) were tied and 3% (n = 25) were unmatched using ArcGIS. Forty-eight addresses (6.4%) were not matched in ArcGIS with a match score ≥80 (therefore only 700 addresses were included in our positional difference analysis). Six hundred thirteen (87.6%) of these addresses had a match score of 100. Batchgeo yielded a 100% match rate for the addresses that ArcGIS geocoded. The median for longitude and latitude coordinates for all the data was just over 25 m. Overall, the range for longitude was 0.04-12,911.8 m, and the range for latitude was 0.02-37,766.6 m. Comparisons show minimal differences in the median and minimum values, while there were slightly larger differences in the maximum values. The majority (>75%) of the geographic differences were within 50 m of each other; mostly <25 m from each other (about 49%). Only about 4% overall were ≥400 m apart. We also found geographic differences in the proportion of addresses that fell within certain meter ranges. The match-score range associated with the Batchgeo accuracy level "approximate" (least accurate) was 84-100 (mean = 92), while the "rooftop" Batchgeo accuracy level (most accurate) delivered a mean of 98.9 but the range was the same. Although future research should compare the positional difference of Batchgeo to criterion measures of longitude/latitude (e.g. with global positioning system measurement), this study suggests that Batchgeo is a good, free-of-charge option to geocode addresses.

A comparison of address point, parcel and street geocoding techniques

Computers, Environment and Urban Systems, 2008

The widespread availability of powerful geocoding tools in commercial GIS software and the interest in spatial analysis at the individual level have made address geocoding a widely employed technique in many different fields. The most commonly used approach to geocoding employs a street network data model, in which addresses are placed along a street segment based on a linear interpolation of the location of the street number within an address range. Several alternatives have emerged, including the use of address points and parcels, but these have not received widespread attention in the literature. This paper reviews the foundation of geocoding and presents a framework for evaluating geocoding quality based on completeness, positional accuracy and repeatability. Geocoding quality was compared using three address data models: address points, parcels and street networks. The empirical evaluation employed a variety of different address databases for three different Counties in Florida. Results indicate that address point geocoding produces geocoding match rates similar to those observed for street network geocoding. Parcel geocoding generally produces much lower match rates, in particular for commercial and multi-family residential addresses. Variability in geocoding match rates between address databases and between geographic areas is substantial, reinforcing the need to strengthen the development of standards for address reference data and improved address data entry validation procedures.