ENH/DOC: reimplement Series delegates/accessors using descriptors by shoyer · Pull Request #9322 · pandas-dev/pandas (original) (raw)
OK, a few things we could do:
- Do checks to take
strout of__dir__for invalid types. This would eliminate the auto-complete issue, but I thinks.str?would give the same message you showed above (objects.strnot found). - Return a standard
StringsMethodobject, but add some sort of hook that checks that the type is valid before every method lookup. You could still auto-completestrmethods, though, and this is more complex for.dt, because it can create several sub-types of accessors. - Make
s.strfor invalid types some sort of "deferred error" object that raisesTypeErrorwhen any attribute is accessed but with a copied docstring fromStringMethods. I tossed together an implementation, which gives us functionality like the following:
In [15]: s = pd.Series([1])
In [16]: s.str.<tab>
In [17]: s.str
Out[17]: <pandas.core.series.InvalidStringMethods at 0x107a32fd0>
In [18]: s.str?
Type: InvalidStringMethods
String form: <pandas.core.series.InvalidStringMethods object at 0x107a8e150>
File: /Users/shoyer/dev/pandas/pandas/core/series.py
Docstring:
Vectorized string functions for Series. NAs stay NA unless handled
otherwise by a particular method. Patterned after Python's string methods,
with some inspiration from R's stringr package.
Examples
--------
>>> s.str.split('_')
>>> s.str.replace('_', '')
In [19]: s.str.cat
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-19-e75e3f77c883> in <module>()
----> 1 s.str.cat
/Users/shoyer/dev/pandas/pandas/core/series.py in __getattr__(self, name)
2552
2553 def __getattr__(self, name):
-> 2554 raise self._error
2555
2556
TypeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
Unfortunately, it is not possible (AFAICT) to make an object on which repr raises a TypeError but for which __doc__ is well defined.
I'm -0 on these options. They add complexity and I don't think they're that much more usable -- if s.str? says not found, the first thing I'm going to try to do is see what s.str is, which will raise the TypeError. I also don't think there are that many who search through the Series namespace for methods -- there are simply too many methods/properties for that to be very useable.