PERF: improve iloc list indexing by jorisvandenbossche · Pull Request #15504 · pandas-dev/pandas (original) (raw)
So the remaining test failure is related to the following:
In [1]: import pandas.util.testing as tm
In [3]: s = tm.makeObjectSeries()
In [4]: s
Out[4]:
MgrcA7IMuH 2000-01-03 00:00:00
QbbI4vogXZ 2000-01-04 00:00:00
f37URcolZJ 2000-01-05 00:00:00
...
aQWgxosz9u 2000-02-09 00:00:00
HnXFRuUGov 2000-02-10 00:00:00
2SA0hNdHwt 2000-02-11 00:00:00
dtype: object
In [5]: s.values
Out[5]:
array([datetime.datetime(2000, 1, 3, 0, 0),
datetime.datetime(2000, 1, 4, 0, 0),
datetime.datetime(2000, 1, 5, 0, 0),
...
datetime.datetime(2000, 2, 9, 0, 0),
datetime.datetime(2000, 2, 10, 0, 0),
datetime.datetime(2000, 2, 11, 0, 0)], dtype=object)
In [6]: s.take([1,3,8])
Out[6]:
QbbI4vogXZ 2000-01-04 00:00:00
zUVaKnBXUH 2000-01-06 00:00:00
TZT2OzuB7y 2000-01-13 00:00:00
dtype: object
In [7]: s.take([1,3,8]).values
Out[7]:
array([datetime.datetime(2000, 1, 4, 0, 0),
datetime.datetime(2000, 1, 6, 0, 0),
datetime.datetime(2000, 1, 13, 0, 0)], dtype=object)
So in the above (this is the behaviour of this PR), after take this preserves the object dtype with datetime.datetime values (I suppose due to using the fastpath in series creation). While the test expects a datetime64 result.
Personally, I actually find the above behaviour more preferable, as simple positional indexing should not necessarily change the dtype. But of course we should agree on that changing the test is OK here (and pandas currently does/assumes such inference on many places, so it has possibly larger implications)
I ran the indexing benchmarks:
before after ratio
[1f890607] [6d2705cd]
+ 8.90μs 16.21μs 1.82 period.period_standard_indexing.time_shallow_copy
+ 240.70μs 364.02μs 1.51 indexing.DataFrameIndexing.time_iloc_dups
- 62.26ms 51.08ms 0.82 indexing.Int64Indexing.time_getitem_list_like
- 9.44μs 5.02μs 0.53 indexing.DataFrameIndexing.time_get_value_ix
- 76.00μs 38.18μs 0.50 indexing.Int64Indexing.time_iloc_list_like
- 1.76ms 136.69μs 0.08 indexing.Int64Indexing.time_iloc_array
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
So the iloc list_like and array improve as expected (the array case even more than I expected). The reason that the period shallow copy is much slower is not really clear to me (I don't think I touched code related to shallow_copy). But when I reran the benchmarks, both those tests with slowdown were not consistent slower, so probably noise.