ENH/DOC: reimplement Series delegates/accessors using descriptors by shoyer · Pull Request #9322 · pandas-dev/pandas (original) (raw)
OK, a few things we could do:
- Do checks to take
str
out of__dir__
for invalid types. This would eliminate the auto-complete issue, but I thinks.str?
would give the same message you showed above (objects.str
not found). - Return a standard
StringsMethod
object, but add some sort of hook that checks that the type is valid before every method lookup. You could still auto-completestr
methods, though, and this is more complex for.dt
, because it can create several sub-types of accessors. - Make
s.str
for invalid types some sort of "deferred error" object that raisesTypeError
when any attribute is accessed but with a copied docstring fromStringMethods
. I tossed together an implementation, which gives us functionality like the following:
In [15]: s = pd.Series([1])
In [16]: s.str.<tab>
In [17]: s.str
Out[17]: <pandas.core.series.InvalidStringMethods at 0x107a32fd0>
In [18]: s.str?
Type: InvalidStringMethods
String form: <pandas.core.series.InvalidStringMethods object at 0x107a8e150>
File: /Users/shoyer/dev/pandas/pandas/core/series.py
Docstring:
Vectorized string functions for Series. NAs stay NA unless handled
otherwise by a particular method. Patterned after Python's string methods,
with some inspiration from R's stringr package.
Examples
--------
>>> s.str.split('_')
>>> s.str.replace('_', '')
In [19]: s.str.cat
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-19-e75e3f77c883> in <module>()
----> 1 s.str.cat
/Users/shoyer/dev/pandas/pandas/core/series.py in __getattr__(self, name)
2552
2553 def __getattr__(self, name):
-> 2554 raise self._error
2555
2556
TypeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
Unfortunately, it is not possible (AFAICT) to make an object on which repr
raises a TypeError
but for which __doc__
is well defined.
I'm -0 on these options. They add complexity and I don't think they're that much more usable -- if s.str?
says not found, the first thing I'm going to try to do is see what s.str
is, which will raise the TypeError. I also don't think there are that many who search through the Series namespace for methods -- there are simply too many methods/properties for that to be very useable.