BUG: Series(nullable).tolist() -> numpy scalars by lukemanley · Pull Request #49890 · pandas-dev/pandas (original) (raw)
- Tests added and passed if fixing a bug or adding a new feature
- All code checks passed.
- Added type annotations to new arguments/methods/functions.
- Added an entry in the latest
doc/source/whatsnew/v2.0.0.rst
file if fixing a bug or adding a new feature.
Series.tolist
documentation states:
These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period)
Nullable dtype behavior seems to be inconsistent with the docstring and other dtypes:
main:
import pandas as pd
type(pd.Series([1], dtype="Int64").tolist()[0]) # numpy.int64
type(pd.Series([1], dtype="int64").tolist()[0]) # int
type(pd.Series([1], dtype="int64[pyarrow]").tolist()[0]) # int
PR:
import pandas as pd
type(pd.Series([1], dtype="Int64").tolist()[0]) # int
type(pd.Series([1], dtype="int64").tolist()[0]) # int
type(pd.Series([1], dtype="int64[pyarrow]").tolist()[0]) # int
The fix also provides an improvement in performance:
import pandas as pd
import numpy as np
ser = pd.Series(np.arange(10**6), dtype="Int64")
%timeit ser.tolist()
# 45.5 ms ± 980 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) <- main
# 28.5 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) <- PR