Chapter 12. PostGIS Extras (original) (raw)

This chapter documents features found in the extras folder of the PostGIS source tarballs and source repository. These are not always packaged with PostGIS binary releases, but are usually PL/pgSQL based or standard shell scripts that can be run as is.

12.1. Address Standardizer

This is a fork of the PAGC standardizer (original code for this portion was PAGC PostgreSQL Address Standardizer).

The address standardizer is a single line address parser that takes an input address and normalizes it based on a set of rules stored in a table and helper lex and gaz tables.

The code is built into a single PostgreSQL extension library called address_standardizer which can be installed with CREATE EXTENSION address_standardizer;. In addition to the address_standardizer extension, a sample data extension called address_standardizer_data_us extensions is built, which contains gaz, lex, and rules tables for US data. This extensions can be installed via: CREATE EXTENSION address_standardizer_data_us;

The code for this extension can be found in the PostGIS extensions/address_standardizer and is currently self-contained.

For installation instructions refer to: Section 2.3, “Installing and Using the address standardizer”.

12.1.1. How the Parser Works

The parser works from right to left looking first at the macro elements for postcode, state/province, city, and then looks micro elements to determine if we are dealing with a house number street or intersection or landmark. It currently does not look for a country code or name, but that could be introduced in the future.

Country code

Assumed to be US or CA based on: postcode as US or Canada state/province as US or Canada else US

Postcode/zipcode

These are recognized using Perl compatible regular expressions. These regexs are currently in the parseaddress-api.c and are relatively simple to make changes to if needed.

State/province

These are recognized using Perl compatible regular expressions. These regexs are currently in the parseaddress-api.c but could get moved into includes in the future for easier maintenance.

12.1.2. Address Standardizer Types

Abstract

This section lists the PostgreSQL data types installed by Address Standardizer extension. Note we describe the casting behavior of these which is very important especially when designing your own functions.

12.1.3. Address Standardizer Tables

Abstract

This section lists the PostgreSQL table formats used by the address_standardizer for normalizing addresses. Note that these tables do not need to be named the same as what is referenced here. You can have different lex, gaz, rules tables for each country for example or for your custom geocoder. The names of these tables get passed into the address standardizer functions.

The packaged extension address_standardizer_data_us contains data for standardizing US addresses.

12.1.4. Address Standardizer Functions

12.2. Tiger Geocoder

Abstract

A plpgsql based geocoder written to work with the TIGER (Topologically Integrated Geographic Encoding and Referencing system ) / Line and Master Address database export released by the US Census Bureau.

There are four components to the geocoder: the data loader functions, the address normalizer, the address geocoder, and the reverse geocoder.

Although it is designed specifically for the US, a lot of the concepts and functions are applicable and can be adapted to work with other country address and road networks.

The script builds a schema called tiger to house all the tiger related functions, reusable lookup data such as road type prefixes, suffixes, states, various control tables for managing data load, and skeleton base tables from which all the tiger loaded tables inherit from.

Another schema called tiger_data is also created which houses all the census data for each state that the loader downloads from Census site and loads into the database. In the current model, each set of state tables is prefixed with the state code e.g ma_addr, ma_edges etc with constraints to enforce only that state data. Each of these tables inherits from the tables addr, faces, edges, etc located in the tiger schema.

All the geocode functions only reference the base tables, so there is no requirement that the data schema be called tiger_data or that data can't be further partitioned into other schemas -- e.g a different schema for each state, as long as all the tables inherit from the tables in the tiger schema.

For instructions on how to enable the extension in your database and also to load data using it, refer to Section 2.4.1, “Tiger Geocoder Enabling your PostGIS database”.

| [Note] | | | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | | If you are using tiger geocoder (tiger_2010), you can upgrade the scripts using the accompanying upgrade_geocoder.bat / .sh scripts in extras/tiger. One major change between tiger_2010 and tiger_2011+ is that the county and state tables are no longer broken out by state. If you have data from tiger_2010 and want to replace with tiger_2015, refer to Section 2.4.4, “Upgrading your Tiger Geocoder Install and Data” | |

| [Note] | | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | | New in PostGIS 2.2.0 release is support for Tiger 2015 data and inclusion of Address Standardizer as part of PostGIS. New in PostGIS 2.1.0 release is ability to install tiger geocoder with PostgreSQL extension model if you are running PostgreSQL 9.1+. Refer to Section 2.4.1, “Tiger Geocoder Enabling your PostGIS database” for details. | |

The Pagc_Normalize_Address function as a drop in replacement for in-built Normalize_Address. Refer to Section 2.3, “Installing and Using the address standardizer” for compile and installation instructions.

Design:

The goal of this project is to build a fully functional geocoder that can process an arbitrary United States address string and using normalized TIGER census data, produce a point geometry and rating reflecting the location of the given address and likeliness of the location. The higher the rating number the worse the result.

The reverse_geocode function, introduced in PostGIS 2.0.0 is useful for deriving the street address and cross streets of a GPS location.

The geocoder should be simple for anyone familiar with PostGIS to install and use, and should be easily installable and usable on all platforms supported by PostGIS.

It should be robust enough to function properly despite formatting and spelling errors.

It should be extensible enough to be used with future data updates, or alternate data sources with a minimum of coding changes.

| [Note] | | | ---------------------------------------------------------------------------------------------- | | | The tiger schema must be added to the database search path for the functions to work properly. | |

There are a couple other open source geocoders for PostGIS, that unlike tiger geocoder have the advantage of multi-country geocoding support