BUG: Series(nullable).tolist() -> numpy scalars by lukemanley · Pull Request #49890 · pandas-dev/pandas (original) (raw)

Series.tolist documentation states:

These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period)

Nullable dtype behavior seems to be inconsistent with the docstring and other dtypes:

main:

import pandas as pd

type(pd.Series([1], dtype="Int64").tolist()[0])             # numpy.int64
type(pd.Series([1], dtype="int64").tolist()[0])             # int
type(pd.Series([1], dtype="int64[pyarrow]").tolist()[0])    # int

PR:

import pandas as pd

type(pd.Series([1], dtype="Int64").tolist()[0])             # int
type(pd.Series([1], dtype="int64").tolist()[0])             # int
type(pd.Series([1], dtype="int64[pyarrow]").tolist()[0])    # int

The fix also provides an improvement in performance:

import pandas as pd
import numpy as np

ser = pd.Series(np.arange(10**6), dtype="Int64")

%timeit ser.tolist()

# 45.5 ms ± 980 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  <- main
# 28.5 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  <- PR