API: Deprecate regex=True default in Series.str.replace by dsaxton · Pull Request #36695 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation35 Commits45 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

dsaxton

From some old discussion it looks like there was interest in changing the default value of regex from True to False within Series.str.replace. I think this makes sense since it would align this method with others within pandas along with the standard library, and would also make fixing #24804 slightly less disruptive. I'm assuming this would warrant a deprecation note in the next whatsnew?

I think this part of the docs will need to be updated: https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html#splitting-and-replacing-strings

#24809

@jorisvandenbossche

There are many tests that are using this (and are now raising a warning) that would need updating.

While I agree that a default of regex=False would probably be better ideally, warning about this in all cases also seems quite annoying ..
Would it somehow be possible to detect in advance if the pattern is regex-like or rather a plain string (in which case replacing it with regex=False would not give a different result). Probably not that straightforward given the complexity of regexes .. (but maybe some simple cases, like only characters, could be detected)

@dsaxton

There are many tests that are using this (and are now raising a warning) that would need updating.

While I agree that a default of regex=False would probably be better ideally, warning about this in all cases also seems quite annoying ..
Would it somehow be possible to detect in advance if the pattern is regex-like or rather a plain string (in which case replacing it with regex=False would not give a different result). Probably not that straightforward given the complexity of regexes .. (but maybe some simple cases, like only characters, could be detected)

Yeah this is a good point, I think looking for regex special characters and only then warning could make the most sense

dsaxton

jorisvandenbossche

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a whatsnew note?

@dsaxton

@dsaxton

@dsaxton

Can you also add a whatsnew note?

Would this be an update for 1.1.4 or 1.2?

@dsaxton

@dsaxton

@dsaxton

@dsaxton

yes, you are excluding them from the warning now though; can we provide a more specific warning message for single char regexes?

Added an additional warning for single character regexes:

[ins] In [3]: s.str.replace(".", "")
<ipython-input-3-ee1a585145f8>:1: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.
  s.str.replace(".", "")
Out[3]: 
0    a
1    b
dtype: object

[ins] In [4]: s.str.replace("^.", "")
<ipython-input-4-3940e91d28a6>:1: FutureWarning: The default value of regex will change from True to False in a future version.
  s.str.replace("^.", "")
Out[4]: 
0    
1    
dtype: object

@dsaxton

looks good. doc comment & pls add a deprecation note in the whatsnew (also add to the deprecation master issue) #30228

Updated the OP in that issue

jreback

Some caution must be taken to keep regular expressions in mind! For example, the
following code will cause trouble because of the regular expression meaning of
``$``:
Some caution must be taken when dealing with regular expressions! The current behavior

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this

Some caution must be taken to keep regular expressions in mind! For example, the
following code will cause trouble because of the regular expression meaning of
``$``:
Some caution must be taken when dealing with regular expressions! The current behavior

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add a versionchanged tag

def test_str_replace_regex_default_raises_warning(self):
# https://github.com/pandas-dev/pandas/pull/24809
s = pd.Series(["a", "b", "c"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check the messages on this warning

if isinstance(pat, str) and any(c in pat for c in ".+*|^$?[](){}\\"):
# warn only in cases where regex behavior would differ from literal
msg = (
"The default value of regex will change from True to False "

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an you make sure that we are testing both of these warnings (as i see you only added a single test)

jreback

Some caution must be taken to keep regular expressions in mind! For example, the
following code will cause trouble because of the regular expression meaning of
``$``:
Some caution must be taken when dealing with regular expressions! The current behavior

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no i mean a ..note:: a sphinx-note (which puts a highlite box around this)

Some caution must be taken to keep regular expressions in mind! For example, the
following code will cause trouble because of the regular expression meaning of
``$``:
Some caution must be taken when dealing with regular expressions! The current behavior

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also do a ..warning:: which is a red-box and more prominent, but either way want to call out this

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice

# This does what you'd naively expect:
dollars.str.replace("$", "")
.. warning::

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great

@jreback

merge on most builds green (some are giving issues now).

@dsaxton dsaxton deleted the replace-regex-default branch

October 10, 2020 23:01

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request

Nov 2, 2020

@dsaxton