ENH: Allow regex matching in fullmatch
mode · Issue #32806 · pandas-dev/pandas (original) (raw)
Problem description
Series.str
contains methods for all the regular expression matching modes in the re
package except for re.fullmatch()
. fullmatch
only returns matches that cover the entire input string, unlike match
, which also returns matches that start at the beginning of the string but do not cover the complete string.
One can work around the lack of fullmatch
by round-tripping to/from numpy arrays and using np.vectorize
, i.e.
s = pd.Series(["foo", "bar", "foobar"]) my_regex = "foo" import re import numpy as np compiled_regex = re.compile(my_regex) regex_f = np.vectorize(lambda s: compiled_regex.fullmatch(s) is not None) matches_array = regex_f(s.values) matches_series = pd.Series(matches_array) matches_series 0 True 1 False 2 False dtype: bool
but it would be more convenient for users if fullmatch
was built in.
The fullmatch
method was added to the re
package in Python 3.4. I think that the reason this method wasn't in previous versions of Pandas was that older versions of Python don't have re.fullmatch
. As of Pandas 1.0, all the supported versions of Python now have fullmatch
.
I have a pull request ready that adds this functionality. After my changes, the Series.str
namespace gets a new method fullmatch
that evaluates re.fullmatch
over the series. For example:
s = pd.Series(["foo", "bar", "foobar"]) s.str.fullmatch("foo") 0 True 1 False 2 False dtype: bool
[Edit: Simplified the workaround]