PERF: improve iloc list indexing by jorisvandenbossche · Pull Request #15504 · pandas-dev/pandas (original) (raw)

So the remaining test failure is related to the following:

In [1]: import pandas.util.testing as tm

In [3]: s = tm.makeObjectSeries()

In [4]: s
Out[4]: 
MgrcA7IMuH    2000-01-03 00:00:00
QbbI4vogXZ    2000-01-04 00:00:00
f37URcolZJ    2000-01-05 00:00:00
...
aQWgxosz9u    2000-02-09 00:00:00
HnXFRuUGov    2000-02-10 00:00:00
2SA0hNdHwt    2000-02-11 00:00:00
dtype: object

In [5]: s.values
Out[5]: 
array([datetime.datetime(2000, 1, 3, 0, 0),
       datetime.datetime(2000, 1, 4, 0, 0),
       datetime.datetime(2000, 1, 5, 0, 0),
...
       datetime.datetime(2000, 2, 9, 0, 0),
       datetime.datetime(2000, 2, 10, 0, 0),
       datetime.datetime(2000, 2, 11, 0, 0)], dtype=object)

In [6]: s.take([1,3,8])
Out[6]: 
QbbI4vogXZ    2000-01-04 00:00:00
zUVaKnBXUH    2000-01-06 00:00:00
TZT2OzuB7y    2000-01-13 00:00:00
dtype: object

In [7]: s.take([1,3,8]).values
Out[7]: 
array([datetime.datetime(2000, 1, 4, 0, 0),
       datetime.datetime(2000, 1, 6, 0, 0),
       datetime.datetime(2000, 1, 13, 0, 0)], dtype=object)

So in the above (this is the behaviour of this PR), after take this preserves the object dtype with datetime.datetime values (I suppose due to using the fastpath in series creation). While the test expects a datetime64 result.
Personally, I actually find the above behaviour more preferable, as simple positional indexing should not necessarily change the dtype. But of course we should agree on that changing the test is OK here (and pandas currently does/assumes such inference on many places, so it has possibly larger implications)

I ran the indexing benchmarks:

    before     after       ratio
  [1f890607] [6d2705cd]
+    8.90μs    16.21μs      1.82  period.period_standard_indexing.time_shallow_copy
+  240.70μs   364.02μs      1.51  indexing.DataFrameIndexing.time_iloc_dups
-   62.26ms    51.08ms      0.82  indexing.Int64Indexing.time_getitem_list_like
-    9.44μs     5.02μs      0.53  indexing.DataFrameIndexing.time_get_value_ix
-   76.00μs    38.18μs      0.50  indexing.Int64Indexing.time_iloc_list_like
-    1.76ms   136.69μs      0.08  indexing.Int64Indexing.time_iloc_array
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

So the iloc list_like and array improve as expected (the array case even more than I expected). The reason that the period shallow copy is much slower is not really clear to me (I don't think I touched code related to shallow_copy). But when I reran the benchmarks, both those tests with slowdown were not consistent slower, so probably noise.