pandas (original) (raw)

Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/v2.0.0.rst file if fixing a bug or adding a new feature.

Series.tolist documentation states:

These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period)

Nullable dtype behavior seems to be inconsistent with the docstring and other dtypes:

main:

import pandas as pd

type(pd.Series([1], dtype="Int64").tolist()[0])             # numpy.int64
type(pd.Series([1], dtype="int64").tolist()[0])             # int
type(pd.Series([1], dtype="int64[pyarrow]").tolist()[0])    # int

PR:

import pandas as pd

type(pd.Series([1], dtype="Int64").tolist()[0])             # int
type(pd.Series([1], dtype="int64").tolist()[0])             # int
type(pd.Series([1], dtype="int64[pyarrow]").tolist()[0])    # int

The fix also provides an improvement in performance:

import pandas as pd
import numpy as np

ser = pd.Series(np.arange(10**6), dtype="Int64")

%timeit ser.tolist()

# 45.5 ms ± 980 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  <- main
# 28.5 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)  <- PR