CLN: Refactor string special methods to common decorator + safe unicode everywhere · Issue #4090 · pandas-dev/pandas (original) (raw)
I was implementing some new objects for another PR and noticed that string methods are duplicated throughout (particularly on objects that don't inherit from each other). Wrote this up on this branch - https://github.com/jtratner/pandas/tree/refactor_string_magic_methods .
If you thing this is worthwhile, I'll split it up a little, add a bit better documentation and submit.
This is used multiple times:
def __str__(self):
"""
Return a string representation for a particular object.
Invoked by str(obj) in both py2/py3.
Yields Bytestring in Py2, Unicode String in py3.
"""
if py3compat.PY3:
return self.__unicode__()
return self.__bytes__()
def __bytes__(self):
"""
Return a string representation for a particular object.
Invoked by bytes(obj) in py3 only.
Yields a bytestring in both py2/py3.
"""
from pandas.core.config import get_option
encoding = get_option("display.encoding")
return self.__unicode__().encode(encoding, 'replace')
def __repr__(self):
"""
Return a string representation for a particular object.
Yields Bytestring in Py2, Unicode String in py3.
"""
return str(self)
Unicode tends to vary, but often is like this:
def __unicode__(self):
"""
Return a string representation for a particular object.
Invoked by unicode(obj) in py2 only. Yields a Unicode String in both
py2/py3.
"""
prepr = pprint_thing(self, escape_chars=('\t', '\r', '\n'), quote_strings=True)
return '%s(%s)' % (type(self).__name__, prepr)
Additionally, a number of objects aren't using a unicode-safe representation of themselves, so this would resolve that as well. Would this be useful to include?