Series.str.replace() is not actually the same as str.replace() · Issue #16808 · pandas-dev/pandas (original) (raw)
Code Sample
In [1]: import pandas as pd
In [2]: series = pd.Series(['a', '(b)'])
In [3]: series.str.replace('a', '[a]') Out[3]: 0 [a] 1 (b) dtype: object
In [4]: series.str.replace('(b)', '[b]') # unexpected behavior Out[4]: 0 a 1 ([b]) dtype: object
In [5]: series.str.replace('(b)', '[b]') # need to escape Out[5]: 0 a 1 [b] dtype: object
In [6]: '(b)'.replace('(b)', '[b]') # Python str.replace is different, uses literal string Out[6]: '[b]'
Problem description
The documentation for Series.str.replace
says that it takes a "string or compiled regex" ... "String can be a character sequence or regular expression." ... "When repl is a string, every pat is replaced as with str.replace()"
However, that's not what is happening - it appears it's interpreting a string as a regex, so you need to escape characters like parentheses.
Expected Output
I would expect that for vanilla strings, it works like regular Python str.replace() - using literal strings instead of regexes.
Alternatively the documentation could be updated, but I think the Python str.replace() behavior is what most users would expect.
Output of pd.show_versions()
# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-83-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8
pandas: 0.20.2
pytest: 3.1.2
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.13.0
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None