ENH: Allow where/mask/Indexers to accept callable · pandas-dev/pandas@7bbd031 (original) (raw)
`@@ -79,6 +79,10 @@ of multi-axis indexing.
`
79
79
``` - A slice object with labels 'a':'f'
, (note that contrary to usual python
`80`
`80`
` slices, **both** the start and the stop are included!)
`
`81`
`81`
` - A boolean array
`
``
`82`
``` +
- A ``callable`` function with one argument (the calling Series, DataFrame or Panel) and
``
83
`+
that returns valid output for indexing (one of the above)
`
``
84
+
``
85
`+
.. versionadded:: 0.18.1
`
82
86
``
83
87
`` See more at :ref:Selection by Label <indexing.label>
``
84
88
``
`@@ -93,6 +97,10 @@ of multi-axis indexing.
`
93
97
``` - A list or array of integers [4, 3, 0]
`94`
`98`
``` - A slice object with ints ``1:7``
95
99
` - A boolean array
`
``
100
- A ``callable`` function with one argument (the calling Series, DataFrame or Panel) and
``
101
`+
that returns valid output for indexing (one of the above)
`
``
102
+
``
103
`+
.. versionadded:: 0.18.1
`
96
104
``
97
105
`` See more at :ref:Selection by Position <indexing.integer>
``
98
106
``
`@@ -110,6 +118,8 @@ of multi-axis indexing.
`
110
118
`` See more at :ref:Advanced Indexing <advanced>
and :ref:`Advanced
``
111
119
`` Hierarchical <advanced.advanced_hierarchical>`.
``
112
120
``
``
121
- ``.loc``, ``.iloc``, ``.ix`` and also ``[]`` indexing can accept a ``callable`` as indexer. See more at :ref:`Selection By Callable <indexing.callable>`.
``
122
+
113
123
`Getting values from an object with multi-axes selection uses the following
`
114
124
``` notation (using .loc
as an example, but applies to .iloc
and .ix
as
`115`
`125`
``` well). Any of the axes accessors may be the null slice ``:``. Axes left out of
``` @@ -317,6 +327,7 @@ The .loc
attribute is the primary access method. The following are valid inp
`317`
`327`
``` - A list or array of labels ``['a', 'b', 'c']``
318
328
``` - A slice object with labels 'a':'f'
(note that contrary to usual python slices, both the start and the stop are included!)
`319`
`329`
`- A boolean array
`
``
`330`
``` +
- A ``callable``, see :ref:`Selection By Callable <indexing.callable>`
320
331
``
321
332
`.. ipython:: python
`
322
333
``
`@@ -340,13 +351,13 @@ With a DataFrame
`
340
351
`index=list('abcdef'),
`
341
352
`columns=list('ABCD'))
`
342
353
` df1
`
343
``
`-
df1.loc[['a','b','d'],:]
`
``
354
`+
df1.loc[['a', 'b', 'd'], :]
`
344
355
``
345
356
`Accessing via label slices
`
346
357
``
347
358
`.. ipython:: python
`
348
359
``
349
``
`-
df1.loc['d':,'A':'C']
`
``
360
`+
df1.loc['d':, 'A':'C']
`
350
361
``
351
362
``` For getting a cross section using a label (equiv to df.xs('a')
)
`352`
`363`
``
`@@ -358,15 +369,15 @@ For getting values with a boolean array
`
`358`
`369`
``
`359`
`370`
`.. ipython:: python
`
`360`
`371`
``
`361`
``
`-
df1.loc['a']>0
`
`362`
``
`-
df1.loc[:,df1.loc['a']>0]
`
``
`372`
`+
df1.loc['a'] > 0
`
``
`373`
`+
df1.loc[:, df1.loc['a'] > 0]
`
`363`
`374`
``
`364`
`375`
``` For getting a value explicitly (equiv to deprecated ``df.get_value('a','A')``)
365
376
``
366
377
`.. ipython:: python
`
367
378
``
368
379
``` # this is also equivalent to df1.at['a','A']
`369`
``
`-
df1.loc['a','A']
`
``
`380`
`+
df1.loc['a', 'A']
`
`370`
`381`
``
`371`
`382`
`.. _indexing.integer:
`
`372`
`383`
``
``` @@ -387,6 +398,7 @@ The ``.iloc`` attribute is the primary access method. The following are valid in
387
398
``` - A list or array of integers [4, 3, 0]
`388`
`399`
``` - A slice object with ints ``1:7``
389
400
`- A boolean array
`
``
401
- A ``callable``, see :ref:`Selection By Callable <indexing.callable>`
390
402
``
391
403
`.. ipython:: python
`
392
404
``
`@@ -416,26 +428,26 @@ Select via integer slicing
`
416
428
`.. ipython:: python
`
417
429
``
418
430
` df1.iloc[:3]
`
419
``
`-
df1.iloc[1:5,2:4]
`
``
431
`+
df1.iloc[1:5, 2:4]
`
420
432
``
421
433
`Select via integer list
`
422
434
``
423
435
`.. ipython:: python
`
424
436
``
425
``
`-
df1.iloc[[1,3,5],[1,3]]
`
``
437
`+
df1.iloc[[1, 3, 5], [1, 3]]
`
426
438
``
427
439
`.. ipython:: python
`
428
440
``
429
``
`-
df1.iloc[1:3,:]
`
``
441
`+
df1.iloc[1:3, :]
`
430
442
``
431
443
`.. ipython:: python
`
432
444
``
433
``
`-
df1.iloc[:,1:3]
`
``
445
`+
df1.iloc[:, 1:3]
`
434
446
``
435
447
`.. ipython:: python
`
436
448
``
437
449
``` # this is also equivalent to df1.iat[1,1]
`438`
``
`-
df1.iloc[1,1]
`
``
`450`
`+
df1.iloc[1, 1]
`
`439`
`451`
``
`440`
`452`
``` For getting a cross section using an integer position (equiv to ``df.xs(1)``)
441
453
``
`@@ -471,8 +483,8 @@ returned)
`
471
483
``
472
484
` dfl = pd.DataFrame(np.random.randn(5,2), columns=list('AB'))
`
473
485
` dfl
`
474
``
`-
dfl.iloc[:,2:3]
`
475
``
`-
dfl.iloc[:,1:3]
`
``
486
`+
dfl.iloc[:, 2:3]
`
``
487
`+
dfl.iloc[:, 1:3]
`
476
488
` dfl.iloc[4:6]
`
477
489
``
478
490
``` A single indexer that is out of bounds will raise an IndexError
.
`@@ -481,12 +493,52 @@ A list of indexers where any element is out of bounds will raise an
`
`481`
`493`
``
`482`
`494`
`.. code-block:: python
`
`483`
`495`
``
`484`
``
`-
dfl.iloc[[4,5,6]]
`
``
`496`
`+
dfl.iloc[[4, 5, 6]]
`
`485`
`497`
`IndexError: positional indexers are out-of-bounds
`
`486`
`498`
``
`487`
``
`-
dfl.iloc[:,4]
`
``
`499`
`+
dfl.iloc[:, 4]
`
`488`
`500`
`IndexError: single positional indexer is out-of-bounds
`
`489`
`501`
``
``
`502`
`+
.. _indexing.callable:
`
``
`503`
`+`
``
`504`
`+
Selection By Callable
`
``
`505`
`+
---------------------
`
``
`506`
`+`
``
`507`
`+
.. versionadded:: 0.18.1
`
``
`508`
`+`
``
`509`
``` +
``.loc``, ``.iloc``, ``.ix`` and also ``[]`` indexing can accept a ``callable`` as indexer.
``
510
The ``callable`` must be a function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing.
``
511
+
``
512
`+
.. ipython:: python
`
``
513
+
``
514
`+
df1 = pd.DataFrame(np.random.randn(6, 4),
`
``
515
`+
index=list('abcdef'),
`
``
516
`+
columns=list('ABCD'))
`
``
517
`+
df1
`
``
518
+
``
519
`+
df1.loc[lambda df: df.A > 0, :]
`
``
520
`+
df1.loc[:, lambda df: ['A', 'B']]
`
``
521
+
``
522
`+
df1.iloc[:, lambda df: [0, 1]]
`
``
523
+
``
524
`+
df1[lambda df: df.columns[0]]
`
``
525
+
``
526
+
``
527
You can use callable indexing in ``Series``.
``
528
+
``
529
`+
.. ipython:: python
`
``
530
+
``
531
`+
df1.A.loc[lambda s: s > 0]
`
``
532
+
``
533
`+
Using these methods / indexers, you can chain data selection operations
`
``
534
`+
without using temporary variable.
`
``
535
+
``
536
`+
.. ipython:: python
`
``
537
+
``
538
`+
bb = pd.read_csv('data/baseball.csv', index_col='id')
`
``
539
`+
(bb.groupby(['year', 'team']).sum()
`
``
540
`+
.loc[lambda df: df.r > 100])
`
``
541
+
490
542
`.. _indexing.basics.partial_setting:
`
491
543
``
492
544
`Selecting Random Samples
`
`@@ -848,6 +900,19 @@ This is equivalent (but faster than) the following.
`
848
900
` df2 = df.copy()
`
849
901
` df.apply(lambda x, y: x.where(x>0,y), y=df['A'])
`
850
902
``
``
903
`+
.. versionadded:: 0.18.1
`
``
904
+
``
905
Where can accept a callable as condition and ``other`` arguments. The function must
``
906
`+
be with one argument (the calling Series or DataFrame) and that returns valid output
`
``
907
as condition and ``other`` argument.
``
908
+
``
909
`+
.. ipython:: python
`
``
910
+
``
911
`+
df3 = pd.DataFrame({'A': [1, 2, 3],
`
``
912
`+
'B': [4, 5, 6],
`
``
913
`+
'C': [7, 8, 9]})
`
``
914
`+
df3.where(lambda x: x > 4, lambda x: x + 10)
`
``
915
+
851
916
`mask
`
852
917
``
853
918
``` mask
is the inverse boolean operation of where
.
```