Series.str.replace() is not actually the same as str.replace() · Issue #16808 · pandas-dev/pandas (original) (raw)

Code Sample

In [1]: import pandas as pd

In [2]: series = pd.Series(['a', '(b)'])

In [3]: series.str.replace('a', '[a]') Out[3]: 0 [a] 1 (b) dtype: object

In [4]: series.str.replace('(b)', '[b]') # unexpected behavior Out[4]: 0 a 1 ([b]) dtype: object

In [5]: series.str.replace('(b)', '[b]') # need to escape Out[5]: 0 a 1 [b] dtype: object

In [6]: '(b)'.replace('(b)', '[b]') # Python str.replace is different, uses literal string Out[6]: '[b]'

Problem description

The documentation for Series.str.replace says that it takes a "string or compiled regex" ... "String can be a character sequence or regular expression." ... "When repl is a string, every pat is replaced as with str.replace()"

However, that's not what is happening - it appears it's interpreting a string as a regex, so you need to escape characters like parentheses.

Expected Output

I would expect that for vanilla strings, it works like regular Python str.replace() - using literal strings instead of regexes.

Alternatively the documentation could be updated, but I think the Python str.replace() behavior is what most users would expect.

Output of pd.show_versions()

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-83-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

pandas: 0.20.2
pytest: 3.1.2
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.0
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None