Customizing the Parser with Your Own Configuration — Nameparser 1.1.3 documentation (original) (raw)

Recognition of titles, prefixes, suffixes and conjunctions is handled by matching the lower case characters of a name piece with pre-defined sets of strings located in nameparser.config. You can adjust these predefined sets to help fine tune the parser for your dataset.

Changing the Parser Constants

There are a few ways to adjust the parser configuration depending on your needs. The config is available in two places.

The first is via from nameparser.config import CONSTANTS.

from nameparser.config import CONSTANTS CONSTANTS <Constants() instance>

The other is the C attribute of a HumanName instance, e.g.hn.C.

from nameparser import HumanName hn = HumanName("Dean Robert Johns") hn.C <Constants() instance>

Both places are usually a reference to the same shared module-levelCONSTANTS instance, depending on how you instantiate the HumanName class (see below).

Editable attributes of nameparser.config.CONSTANTS

Each set of constants comes with add() and remove() methods for tuning the constants for your project. These methods automatically lower case and remove punctuation to normalize them for comparison.

Other editable attributes

Parser Customization Examples

Removing a Title

Take a look at the nameparser.config documentation to see what’s in the constants. Here’s a quick walk through of some examples where you might want to adjust them.

“Hon” is a common abbreviation for “Honorable”, a title used when addressing judges, and is included in the default tiles constants. This means it will never be considered a first name, because titles are the pieces before first names.

But “Hon” is also sometimes a first name. If your dataset contains more “Hon”s than “Honorable”s, you may wish to remove it from the titles constant so that “Hon” can be parsed as a first name.

from nameparser import HumanName hn = HumanName("Hon Solo") hn <HumanName : [ title: 'Hon' first: '' middle: '' last: 'Solo' suffix: '' nickname: '' ]> from nameparser.config import CONSTANTS CONSTANTS.titles.remove('hon') SetManager({'right', ..., 'tax'}) hn = HumanName("Hon Solo") hn <HumanName : [ title: '' first: 'Hon' middle: '' last: 'Solo' suffix: '' nickname: '' ]>

If you don’t want to detect any titles at all, you can remove all of them:

CONSTANTS.titles.remove(*CONSTANTS.titles)

Adding a Title

You can also pass a Constants instance to HumanName on instantiation.

“Dean” is a common first name so it is not included in the default titles constant. But in some contexts it is more common as a title. If you would like “Dean” to be parsed as a title, simply add it to the titles constant.

You can pass multiple strings to both the add()and remove()methods and each string will be added or removed. Both functions automatically normalize the strings for the parser’s comparison method by making them lower case and removing periods.

from nameparser import HumanName from nameparser.config import Constants constants = Constants() constants.titles.add('dean', 'Chemistry') SetManager({'right', ..., 'tax'}) hn = HumanName("Assoc Dean of Chemistry Robert Johns", constants=constants) hn <HumanName : [ title: 'Assoc Dean of Chemistry' first: 'Robert' middle: '' last: 'Johns' suffix: '' nickname: '' ]>