Construction of Categorical from array with pd.NA failing · Issue #31927 · pandas-dev/pandas (original) (raw)

In [10]: pd.Categorical(np.array(["a", "b", pd.NA], dtype=object))  
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/scipy/pandas/pandas/core/arrays/categorical.py in __init__(self, values, categories, ordered, dtype, fastpath)
    354             try:
--> 355                 codes, categories = factorize(values, sort=True)
    356             except TypeError:

~/scipy/pandas/pandas/core/algorithms.py in factorize(values, sort, na_sentinel, size_hint)
    635         codes, uniques = _factorize_array(
--> 636             values, na_sentinel=na_sentinel, size_hint=size_hint, na_value=na_value
    637         )

~/scipy/pandas/pandas/core/algorithms.py in _factorize_array(values, na_sentinel, size_hint, na_value)
    483     table = hash_klass(size_hint or len(values))
--> 484     uniques, codes = table.factorize(values, na_sentinel=na_sentinel, na_value=na_value)
    485 

~/scipy/pandas/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.factorize()

~/scipy/pandas/pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable._unique()

~/scipy/pandas/pandas/_libs/missing.pyx in pandas._libs.missing.NAType.__bool__()

TypeError: boolean value of NA is ambiguous
In [11]: pd.Categorical(["a", "b", pd.NA]) 
Out[11]: 
[a, b, NaN]
Categories (2, object): [a, b]

This also means that creating a Categorical from a StringArray with missing values won't work (with the same error as above).
I am only not sure if it should work (at least before we can have a "string" dtype index as the categories of the created Categorical).