ENH: Allow regex matching in fullmatch mode · Issue #32806 · pandas-dev/pandas (original) (raw)

Problem description

Series.str contains methods for all the regular expression matching modes in the re package except for re.fullmatch(). fullmatch only returns matches that cover the entire input string, unlike match, which also returns matches that start at the beginning of the string but do not cover the complete string.

One can work around the lack of fullmatch by round-tripping to/from numpy arrays and using np.vectorize, i.e.

s = pd.Series(["foo", "bar", "foobar"]) my_regex = "foo" import re import numpy as np compiled_regex = re.compile(my_regex) regex_f = np.vectorize(lambda s: compiled_regex.fullmatch(s) is not None) matches_array = regex_f(s.values) matches_series = pd.Series(matches_array) matches_series 0 True 1 False 2 False dtype: bool

but it would be more convenient for users if fullmatch was built in.

The fullmatch method was added to the re package in Python 3.4. I think that the reason this method wasn't in previous versions of Pandas was that older versions of Python don't have re.fullmatch. As of Pandas 1.0, all the supported versions of Python now have fullmatch.

I have a pull request ready that adds this functionality. After my changes, the Series.str namespace gets a new method fullmatch that evaluates re.fullmatch over the series. For example:

s = pd.Series(["foo", "bar", "foobar"]) s.str.fullmatch("foo") 0 True 1 False 2 False dtype: bool

[Edit: Simplified the workaround]