ENH: Allow where/mask/Indexers to accept callable · pandas-dev/pandas@7bbd031 (original) (raw)

`@@ -79,6 +79,10 @@ of multi-axis indexing.

`

79

79

``` - A slice object with labels 'a':'f', (note that contrary to usual python


`80`

`80`

` slices, **both** the start and the stop are included!)

`

`81`

`81`

` - A boolean array

`

``

`82`

``` +

- A ``callable`` function with one argument (the calling Series, DataFrame or Panel) and

``

83

`+

that returns valid output for indexing (one of the above)

`

``

84

+

``

85

`+

.. versionadded:: 0.18.1

`

82

86

``

83

87

`` See more at :ref:Selection by Label <indexing.label>

``

84

88

``

`@@ -93,6 +97,10 @@ of multi-axis indexing.

`

93

97

``` - A list or array of integers [4, 3, 0]


`94`

`98`

```  - A slice object with ints ``1:7``

95

99

` - A boolean array

`

``

100


 - A ``callable`` function with one argument (the calling Series, DataFrame or Panel) and

``

101

`+

that returns valid output for indexing (one of the above)

`

``

102

+

``

103

`+

.. versionadded:: 0.18.1

`

96

104

``

97

105

`` See more at :ref:Selection by Position <indexing.integer>

``

98

106

``

`@@ -110,6 +118,8 @@ of multi-axis indexing.

`

110

118

`` See more at :ref:Advanced Indexing <advanced> and :ref:`Advanced

``

111

119

`` Hierarchical <advanced.advanced_hierarchical>`.

``

112

120

``

``

121


- ``.loc``, ``.iloc``, ``.ix`` and also ``[]`` indexing can accept a ``callable`` as indexer. See more at :ref:`Selection By Callable <indexing.callable>`.

``

122

+

113

123

`Getting values from an object with multi-axes selection uses the following

`

114

124

``` notation (using .loc as an example, but applies to .iloc and .ix as


`115`

`125`

``` well). Any of the axes accessors may be the null slice ``:``. Axes left out of

``` @@ -317,6 +327,7 @@ The .loc attribute is the primary access method. The following are valid inp


`317`

`327`

``` - A list or array of labels ``['a', 'b', 'c']``

318

328

``` - A slice object with labels 'a':'f' (note that contrary to usual python slices, both the start and the stop are included!)


`319`

`329`

`- A boolean array

`

``

`330`

``` +

- A ``callable``, see :ref:`Selection By Callable <indexing.callable>`

320

331

``

321

332

`.. ipython:: python

`

322

333

``

`@@ -340,13 +351,13 @@ With a DataFrame

`

340

351

`index=list('abcdef'),

`

341

352

`columns=list('ABCD'))

`

342

353

` df1

`

343

``

`-

df1.loc[['a','b','d'],:]

`

``

354

`+

df1.loc[['a', 'b', 'd'], :]

`

344

355

``

345

356

`Accessing via label slices

`

346

357

``

347

358

`.. ipython:: python

`

348

359

``

349

``

`-

df1.loc['d':,'A':'C']

`

``

360

`+

df1.loc['d':, 'A':'C']

`

350

361

``

351

362

``` For getting a cross section using a label (equiv to df.xs('a'))


`352`

`363`

``

`@@ -358,15 +369,15 @@ For getting values with a boolean array

`

`358`

`369`

``

`359`

`370`

`.. ipython:: python

`

`360`

`371`

``

`361`

``

`-

df1.loc['a']>0

`

`362`

``

`-

df1.loc[:,df1.loc['a']>0]

`

``

`372`

`+

df1.loc['a'] > 0

`

``

`373`

`+

df1.loc[:, df1.loc['a'] > 0]

`

`363`

`374`

``

`364`

`375`

``` For getting a value explicitly (equiv to deprecated ``df.get_value('a','A')``)

365

376

``

366

377

`.. ipython:: python

`

367

378

``

368

379

``` # this is also equivalent to df1.at['a','A']


`369`

``

`-

df1.loc['a','A']

`

``

`380`

`+

df1.loc['a', 'A']

`

`370`

`381`

``

`371`

`382`

`.. _indexing.integer:

`

`372`

`383`

``

``` @@ -387,6 +398,7 @@ The ``.iloc`` attribute is the primary access method. The following are valid in

387

398

``` - A list or array of integers [4, 3, 0]


`388`

`399`

``` - A slice object with ints ``1:7``

389

400

`- A boolean array

`

``

401


- A ``callable``, see :ref:`Selection By Callable <indexing.callable>`

390

402

``

391

403

`.. ipython:: python

`

392

404

``

`@@ -416,26 +428,26 @@ Select via integer slicing

`

416

428

`.. ipython:: python

`

417

429

``

418

430

` df1.iloc[:3]

`

419

``

`-

df1.iloc[1:5,2:4]

`

``

431

`+

df1.iloc[1:5, 2:4]

`

420

432

``

421

433

`Select via integer list

`

422

434

``

423

435

`.. ipython:: python

`

424

436

``

425

``

`-

df1.iloc[[1,3,5],[1,3]]

`

``

437

`+

df1.iloc[[1, 3, 5], [1, 3]]

`

426

438

``

427

439

`.. ipython:: python

`

428

440

``

429

``

`-

df1.iloc[1:3,:]

`

``

441

`+

df1.iloc[1:3, :]

`

430

442

``

431

443

`.. ipython:: python

`

432

444

``

433

``

`-

df1.iloc[:,1:3]

`

``

445

`+

df1.iloc[:, 1:3]

`

434

446

``

435

447

`.. ipython:: python

`

436

448

``

437

449

``` # this is also equivalent to df1.iat[1,1]


`438`

``

`-

df1.iloc[1,1]

`

``

`450`

`+

df1.iloc[1, 1]

`

`439`

`451`

``

`440`

`452`

``` For getting a cross section using an integer position (equiv to ``df.xs(1)``)

441

453

``

`@@ -471,8 +483,8 @@ returned)

`

471

483

``

472

484

` dfl = pd.DataFrame(np.random.randn(5,2), columns=list('AB'))

`

473

485

` dfl

`

474

``

`-

dfl.iloc[:,2:3]

`

475

``

`-

dfl.iloc[:,1:3]

`

``

486

`+

dfl.iloc[:, 2:3]

`

``

487

`+

dfl.iloc[:, 1:3]

`

476

488

` dfl.iloc[4:6]

`

477

489

``

478

490

``` A single indexer that is out of bounds will raise an IndexError.


`@@ -481,12 +493,52 @@ A list of indexers where any element is out of bounds will raise an

`

`481`

`493`

``

`482`

`494`

`.. code-block:: python

`

`483`

`495`

``

`484`

``

`-

dfl.iloc[[4,5,6]]

`

``

`496`

`+

dfl.iloc[[4, 5, 6]]

`

`485`

`497`

`IndexError: positional indexers are out-of-bounds

`

`486`

`498`

``

`487`

``

`-

dfl.iloc[:,4]

`

``

`499`

`+

dfl.iloc[:, 4]

`

`488`

`500`

`IndexError: single positional indexer is out-of-bounds

`

`489`

`501`

``

``

`502`

`+

.. _indexing.callable:

`

``

`503`

`+`

``

`504`

`+

Selection By Callable

`

``

`505`

`+

---------------------

`

``

`506`

`+`

``

`507`

`+

.. versionadded:: 0.18.1

`

``

`508`

`+`

``

`509`

``` +

``.loc``, ``.iloc``, ``.ix`` and also ``[]`` indexing can accept a ``callable`` as indexer.

``

510


The ``callable`` must be a function with one argument (the calling Series, DataFrame or Panel) and that returns valid output for indexing.

``

511

+

``

512

`+

.. ipython:: python

`

``

513

+

``

514

`+

df1 = pd.DataFrame(np.random.randn(6, 4),

`

``

515

`+

index=list('abcdef'),

`

``

516

`+

columns=list('ABCD'))

`

``

517

`+

df1

`

``

518

+

``

519

`+

df1.loc[lambda df: df.A > 0, :]

`

``

520

`+

df1.loc[:, lambda df: ['A', 'B']]

`

``

521

+

``

522

`+

df1.iloc[:, lambda df: [0, 1]]

`

``

523

+

``

524

`+

df1[lambda df: df.columns[0]]

`

``

525

+

``

526

+

``

527


You can use callable indexing in ``Series``.

``

528

+

``

529

`+

.. ipython:: python

`

``

530

+

``

531

`+

df1.A.loc[lambda s: s > 0]

`

``

532

+

``

533

`+

Using these methods / indexers, you can chain data selection operations

`

``

534

`+

without using temporary variable.

`

``

535

+

``

536

`+

.. ipython:: python

`

``

537

+

``

538

`+

bb = pd.read_csv('data/baseball.csv', index_col='id')

`

``

539

`+

(bb.groupby(['year', 'team']).sum()

`

``

540

`+

.loc[lambda df: df.r > 100])

`

``

541

+

490

542

`.. _indexing.basics.partial_setting:

`

491

543

``

492

544

`Selecting Random Samples

`

`@@ -848,6 +900,19 @@ This is equivalent (but faster than) the following.

`

848

900

` df2 = df.copy()

`

849

901

` df.apply(lambda x, y: x.where(x>0,y), y=df['A'])

`

850

902

``

``

903

`+

.. versionadded:: 0.18.1

`

``

904

+

``

905


Where can accept a callable as condition and ``other`` arguments. The function must

``

906

`+

be with one argument (the calling Series or DataFrame) and that returns valid output

`

``

907


as condition and ``other`` argument.

``

908

+

``

909

`+

.. ipython:: python

`

``

910

+

``

911

`+

df3 = pd.DataFrame({'A': [1, 2, 3],

`

``

912

`+

'B': [4, 5, 6],

`

``

913

`+

'C': [7, 8, 9]})

`

``

914

`+

df3.where(lambda x: x > 4, lambda x: x + 10)

`

``

915

+

851

916

`mask

`

852

917

``

853

918

``` mask is the inverse boolean operation of where.

```