Construction of Categorical from array with pd.NA failing · Issue #31927 · pandas-dev/pandas (original) (raw)
In [10]: pd.Categorical(np.array(["a", "b", pd.NA], dtype=object))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/scipy/pandas/pandas/core/arrays/categorical.py in __init__(self, values, categories, ordered, dtype, fastpath)
354 try:
--> 355 codes, categories = factorize(values, sort=True)
356 except TypeError:
~/scipy/pandas/pandas/core/algorithms.py in factorize(values, sort, na_sentinel, size_hint)
635 codes, uniques = _factorize_array(
--> 636 values, na_sentinel=na_sentinel, size_hint=size_hint, na_value=na_value
637 )
~/scipy/pandas/pandas/core/algorithms.py in _factorize_array(values, na_sentinel, size_hint, na_value)
483 table = hash_klass(size_hint or len(values))
--> 484 uniques, codes = table.factorize(values, na_sentinel=na_sentinel, na_value=na_value)
485
~/scipy/pandas/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.factorize()
~/scipy/pandas/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable._unique()
~/scipy/pandas/pandas/_libs/missing.pyx in pandas._libs.missing.NAType.__bool__()
TypeError: boolean value of NA is ambiguous
In [11]: pd.Categorical(["a", "b", pd.NA])
Out[11]:
[a, b, NaN]
Categories (2, object): [a, b]
This also means that creating a Categorical from a StringArray with missing values won't work (with the same error as above).
I am only not sure if it should work (at least before we can have a "string" dtype index as the categories of the created Categorical).