BUG: Fix getitem dtype preservation with multiindexes by m-richards · Pull Request #51895 · pandas-dev/pandas (original) (raw)
columns = pd.MultiIndex.from_product([range(10), range(20)])
df = pd.DataFrame(np.random.randn(10000, len(columns)), columns=columns).copy()
slice = df.columns.get_loc(0)
# gives slice(0, 20, None)
In [47]: %timeit df.iloc[:, slice]
96.2 µs ± 4.58 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [48]: %timeit pd.DataFrame(df.values[:, slice], index=df.index)
33.9 µs ± 1.24 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
So iloc is a bit slower, having a bit more overhead (mostly in creating the resulting MultiIndex columns, which we then override ..). But this mostly fixed overhead regardless of the size of the data, and we are speaking about microseconds.
The actual subsetting still happens with a slice. And just to be sure to compare to a non-slice case (where we do an actual "take") selecting the same subset:
In [49]: idx = np.arange(20)
In [50]: %timeit df.iloc[:, idx]
576 µs ± 19.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)