API: Deprecate regex=True default in Series.str.replace by dsaxton · Pull Request #36695 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation35 Commits45 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
From some old discussion it looks like there was interest in changing the default value of regex from True to False within Series.str.replace. I think this makes sense since it would align this method with others within pandas along with the standard library, and would also make fixing #24804 slightly less disruptive. I'm assuming this would warrant a deprecation note in the next whatsnew?
I think this part of the docs will need to be updated: https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html#splitting-and-replacing-strings
There are many tests that are using this (and are now raising a warning) that would need updating.
While I agree that a default of regex=False
would probably be better ideally, warning about this in all cases also seems quite annoying ..
Would it somehow be possible to detect in advance if the pattern is regex-like or rather a plain string (in which case replacing it with regex=False would not give a different result). Probably not that straightforward given the complexity of regexes .. (but maybe some simple cases, like only characters, could be detected)
There are many tests that are using this (and are now raising a warning) that would need updating.
While I agree that a default of
regex=False
would probably be better ideally, warning about this in all cases also seems quite annoying ..
Would it somehow be possible to detect in advance if the pattern is regex-like or rather a plain string (in which case replacing it with regex=False would not give a different result). Probably not that straightforward given the complexity of regexes .. (but maybe some simple cases, like only characters, could be detected)
Yeah this is a good point, I think looking for regex special characters and only then warning could make the most sense
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a whatsnew note?
Can you also add a whatsnew note?
Would this be an update for 1.1.4 or 1.2?
yes, you are excluding them from the warning now though; can we provide a more specific warning message for single char regexes?
Added an additional warning for single character regexes:
[ins] In [3]: s.str.replace(".", "")
<ipython-input-3-ee1a585145f8>:1: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.
s.str.replace(".", "")
Out[3]:
0 a
1 b
dtype: object
[ins] In [4]: s.str.replace("^.", "")
<ipython-input-4-3940e91d28a6>:1: FutureWarning: The default value of regex will change from True to False in a future version.
s.str.replace("^.", "")
Out[4]:
0
1
dtype: object
looks good. doc comment & pls add a deprecation note in the whatsnew (also add to the deprecation master issue) #30228
Updated the OP in that issue
Some caution must be taken to keep regular expressions in mind! For example, the |
---|
following code will cause trouble because of the regular expression meaning of |
``$``: |
Some caution must be taken when dealing with regular expressions! The current behavior |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do this
Some caution must be taken to keep regular expressions in mind! For example, the |
---|
following code will cause trouble because of the regular expression meaning of |
``$``: |
Some caution must be taken when dealing with regular expressions! The current behavior |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also add a versionchanged tag
def test_str_replace_regex_default_raises_warning(self): |
# https://github.com/pandas-dev/pandas/pull/24809 |
s = pd.Series(["a", "b", "c"]) |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you check the messages on this warning
if isinstance(pat, str) and any(c in pat for c in ".+*|^$?[](){}\\"): |
---|
# warn only in cases where regex behavior would differ from literal |
msg = ( |
"The default value of regex will change from True to False " |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an you make sure that we are testing both of these warnings (as i see you only added a single test)
Some caution must be taken to keep regular expressions in mind! For example, the |
---|
following code will cause trouble because of the regular expression meaning of |
``$``: |
Some caution must be taken when dealing with regular expressions! The current behavior |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no i mean a ..note::
a sphinx-note (which puts a highlite box around this)
Some caution must be taken to keep regular expressions in mind! For example, the |
---|
following code will cause trouble because of the regular expression meaning of |
``$``: |
Some caution must be taken when dealing with regular expressions! The current behavior |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could also do a ..warning::
which is a red-box and more prominent, but either way want to call out this
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice
# This does what you'd naively expect: |
dollars.str.replace("$", "") |
.. warning:: |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great
merge on most builds green (some are giving issues now).
dsaxton deleted the replace-regex-default branch
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request