BUG: .extractall() throws AssertionError if capture group length > 1 · Issue #13382 · pandas-dev/pandas (original) (raw)

Code to replicate error:

import pandas as pd
s = pd.Series(["a13a23", "b13", "c13"], index=["A", "B", "C"])
s.str.extractall("[ab](\d\d)")

Note that the regex [ab](\d) from the documentation page works, whereas [ab](\d\d) above doesn't. It seems that any captured group with a length of > 1 causes this error.

Though playing with this a bit more, the following regex's all seem to work correctly without error:

([ab])(\d\d)
()[ab](\d+)
(a13)(\d\d)

I've reproduced the issue in both versions 0.18.0 and 0.18.1. I'll admit I've not checked against the master branch though.

Note: I posted this to the mailing list, but haven't had any responses - thus I assume this is a bug.
I'm unsure what the underlying cause is here (maybe it doesn't like the first regex character not being within a capture group?).