ENH: Implement DataFrame.astype('category') by jschendel · Pull Request #18099 · pandas-dev/pandas (original) (raw)
Updated to address the comments:
- Modified the
DataFrame
constructor to be consistent withDataFrame.astype
- Modified
categorical.rst
to reflect this - Added this to the breaking API changes section of the whatsnew
- Modified
- Moved logic to the
_get_categorical_dtype_2d
helper function incategorical.py
- Created
unique2d
and_ensure_arraylike2d
functions inalgorithms.py
- Added tests for the three items above
Regarding _ensure_arraylike2d
, the reason I created this was because I was running into an issue with _ensure_arraylike
where a list of lists gets returned as a numpy array of lists instead of a 2d numpy array:
In [22]: values = [['a', 'b', 'c', 'a'], ['b', np.nan, 'd', 'd']]
In [23]: _ensure_arraylike(values) Out[23]: array([list(['a', 'b', 'c', 'a']), list(['b', nan, 'd', 'd'])], dtype=object)
whereas I'd like to get output along the lines of:
In [24]: np.array([_ensure_arraylike(x) for x in values]) Out[24]: array([['a', 'b', 'c', 'a'], ['b', nan, 'd', 'd']], dtype=object)
Should I just try patching _ensure_arraylike
instead of creating _ensure_arraylike2d
? Or is what I did fine?