Spellcheck of docs, a few minor changes (#18973) · pandas-dev/pandas@8433562 (original) (raw)

`` @@ -24,9 +24,9 @@ See the :ref:Indexing and Selecting Data <indexing> for general indexing docum

``

24

24

` Whether a copy or a reference is returned for a setting operation, may

`

25

25

``` depend on the context. This is sometimes called chained assignment and


`26`

`26`

``  should be avoided. See :ref:`Returning a View versus Copy

``

`27`

``

`` -

<indexing.view_versus_copy>`

``

``

`27`

`` +

<indexing.view_versus_copy>`.

``

`28`

`28`

``

`29`

``

`` -

See the :ref:`cookbook<cookbook.selection>` for some advanced strategies

``

``

`29`

`` +

See the :ref:`cookbook<cookbook.selection>` for some advanced strategies.

``

`30`

`30`

``

`31`

`31`

`.. _advanced.hierarchical:

`

`32`

`32`

``

`` @@ -46,7 +46,7 @@ described above and in prior sections. Later, when discussing :ref:`group by

``

`46`

`46`

`non-trivial applications to illustrate how it aids in structuring data for

`

`47`

`47`

`analysis.

`

`48`

`48`

``

`49`

``

`` -

See the :ref:`cookbook<cookbook.multi_index>` for some advanced strategies

``

``

`49`

`` +

See the :ref:`cookbook<cookbook.multi_index>` for some advanced strategies.

``

`50`

`50`

``

`51`

`51`

`Creating a MultiIndex (hierarchical index) object

`

`52`

`52`

`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`

``` @@ -59,7 +59,7 @@ can think of ``MultiIndex`` as an array of tuples where each tuple is unique. A

59

59

``` MultiIndex.from_tuples), or a crossed set of iterables (using


`60`

`60`

``` ``MultiIndex.from_product``). The ``Index`` constructor will attempt to return

61

61

``` a MultiIndex when it is passed a list of tuples. The following examples


`62`

``

`-

demo different ways to initialize MultiIndexes.

`

``

`62`

`+

demonstrate different ways to initialize MultiIndexes.

`

`63`

`63`

``

`64`

`64`

``

`65`

`65`

`.. ipython:: python

`

`@@ -196,7 +196,8 @@ highly performant. If you want to see the actual used levels.

`

`196`

`196`

`# for a specific level

`

`197`

`197`

` df[['foo','qux']].columns.get_level_values(0)

`

`198`

`198`

``

`199`

``

``` -

To reconstruct the ``MultiIndex`` with only the used levels

``

199


To reconstruct the ``MultiIndex`` with only the used levels, the 

``

200


``remove_unused_levels`` method may be used.

200

201

``

201

202

`.. versionadded:: 0.20.0

`

202

203

``

`@@ -216,7 +217,7 @@ tuples:

`

216

217

` s + s[:-2]

`

217

218

` s + s[::2]

`

218

219

``

219

``


``reindex`` can be called with another ``MultiIndex`` or even a list or array

``

220


``reindex`` can be called with another ``MultiIndex``, or even a list or array

220

221

`of tuples:

`

221

222

``

222

223

`.. ipython:: python

`

`@@ -230,7 +231,7 @@ Advanced indexing with hierarchical index

`

230

231

`-----------------------------------------

`

231

232

``

232

233

``` Syntactically integrating MultiIndex in advanced indexing with .loc is a


`233`

``

`-

bit challenging, but we've made every effort to do so. for example the

`

``

`234`

`+

bit challenging, but we've made every effort to do so. For example the

`

`234`

`235`

`following works as you would expect:

`

`235`

`236`

``

`236`

`237`

`.. ipython:: python

`

`@@ -286,7 +287,7 @@ As usual, **both sides** of the slicers are included as this is label indexing.

`

`286`

`287`

``

`287`

`288`

` df.loc[(slice('A1','A3'),.....), :]

`

`288`

`289`

``

`289`

``

`-

rather than this:

`

``

`290`

`+

You should **not** do this:

`

`290`

`291`

``

`291`

`292`

` .. code-block:: python

`

`292`

`293`

``

`@@ -315,7 +316,7 @@ Basic multi-index slicing using slices, lists, and labels.

`

`315`

`316`

``

`316`

`317`

` dfmi.loc[(slice('A1','A3'), slice(None), ['C1', 'C3']), :]

`

`317`

`318`

``

`318`

``

``` -

You can use a ``pd.IndexSlice`` to have a more natural syntax using ``:`` rather than using ``slice(None)``

``

319


You can use :class:`pandas.IndexSlice` to facilitate a more natural syntax using ``:``, rather than using ``slice(None)``.

319

320

``

320

321

`.. ipython:: python

`

321

322

``

`@@ -344,7 +345,7 @@ slicers on a single axis.

`

344

345

``

345

346

` dfmi.loc(axis=0)[:, :, ['C1', 'C3']]

`

346

347

``

347

``

`-

Furthermore you can set the values using these methods

`

``

348

`+

Furthermore you can set the values using the following methods.

`

348

349

``

349

350

`.. ipython:: python

`

350

351

``

`@@ -379,7 +380,7 @@ selecting data at a particular level of a MultiIndex easier.

`

379

380

` df.loc[(slice(None),'one'),:]

`

380

381

``

381

382

`` You can also select on the columns with :meth:~pandas.MultiIndex.xs, by

``

382

``

`-

providing the axis argument

`

``

383

`+

providing the axis argument.

`

383

384

``

384

385

`.. ipython:: python

`

385

386

``

`@@ -391,7 +392,7 @@ providing the axis argument

`

391

392

`# using the slicers

`

392

393

` df.loc[:,(slice(None),'one')]

`

393

394

``

394

``

`` -

:meth:~pandas.MultiIndex.xs also allows selection with multiple keys

``

``

395

`` +

:meth:~pandas.MultiIndex.xs also allows selection with multiple keys.

``

395

396

``

396

397

`.. ipython:: python

`

397

398

``

`@@ -403,13 +404,13 @@ providing the axis argument

`

403

404

` df.loc[:,('bar','one')]

`

404

405

``

405

406

``` You can pass drop_level=False to :meth:~pandas.MultiIndex.xs to retain


`406`

``

`-

the level that was selected

`

``

`407`

`+

the level that was selected.

`

`407`

`408`

``

`408`

`409`

`.. ipython:: python

`

`409`

`410`

``

`410`

`411`

` df.xs('one', level='second', axis=1, drop_level=False)

`

`411`

`412`

``

`412`

``

``` -

versus the result with ``drop_level=True`` (the default value)

``

413


Compare the above with the result using ``drop_level=True`` (the default value).

413

414

``

414

415

`.. ipython:: python

`

415

416

``

`@@ -470,7 +471,7 @@ allowing you to permute the hierarchical index levels in one step:

`

470

471

`` Sorting a :class:~pandas.MultiIndex

``

471

472

`-------------------------------------

`

472

473

``

473

``

`-

For MultiIndex-ed objects to be indexed & sliced effectively, they need

`

``

474

`+

For MultiIndex-ed objects to be indexed and sliced effectively, they need

`

474

475

``` to be sorted. As with any index, you can use sort_index.


`475`

`476`

``

`476`

`477`

`.. ipython:: python

`

`@@ -623,7 +624,8 @@ Index Types

`

`623`

`624`

`-----------

`

`624`

`625`

``

`625`

`626`

``` We have discussed ``MultiIndex`` in the previous sections pretty extensively. ``DatetimeIndex`` and ``PeriodIndex``

626

``


are shown :ref:`here <timeseries.overview>`. ``TimedeltaIndex`` are :ref:`here <timedeltas.timedeltas>`.

``

627

`` +

are shown :ref:here <timeseries.overview>, and information about

``

``

628


`TimedeltaIndex`` is found :ref:`here <timedeltas.timedeltas>`.

627

629

``

628

630

`In the following sub-sections we will highlight some other index types.

`

629

631

``

`@@ -647,44 +649,46 @@ and allows efficient indexing and storage of an index with a large number of dup

`

647

649

` df.dtypes

`

648

650

` df.B.cat.categories

`

649

651

``

650

``


Setting the index, will create a ``CategoricalIndex``

``

652


Setting the index will create a ``CategoricalIndex``.

651

653

``

652

654

`.. ipython:: python

`

653

655

``

654

656

` df2 = df.set_index('B')

`

655

657

` df2.index

`

656

658

``

657

659

``` Indexing with __getitem__/.iloc/.loc works similarly to an Index with duplicates.


`658`

``

`-

The indexers MUST be in the category or the operation will raise.

`

``

`660`

``` +

The indexers **must** be in the category or the operation will raise a ``KeyError``.

659

661

``

660

662

`.. ipython:: python

`

661

663

``

662

664

` df2.loc['a']

`

663

665

``

664

``


These PRESERVE the ``CategoricalIndex``

``

666


The ``CategoricalIndex`` is **preserved** after indexing:

665

667

``

666

668

`.. ipython:: python

`

667

669

``

668

670

` df2.loc['a'].index

`

669

671

``

670

``

`-

Sorting will order by the order of the categories

`

``

672

`+

Sorting the index will sort by the order of the categories (Recall that we

`

``

673


created the index with with ``CategoricalDtype(list('cab'))``, so the sorted 

``

674


order is ``cab``.). 

671

675

``

672

676

`.. ipython:: python

`

673

677

``

674

678

` df2.sort_index()

`

675

679

``

676

``

`-

Groupby operations on the index will preserve the index nature as well

`

``

680

`+

Groupby operations on the index will preserve the index nature as well.

`

677

681

``

678

682

`.. ipython:: python

`

679

683

``

680

684

` df2.groupby(level=0).sum()

`

681

685

` df2.groupby(level=0).sum().index

`

682

686

``

683

``

`-

Reindexing operations, will return a resulting index based on the type of the passed

`

684

``


indexer, meaning that passing a list will return a plain-old-``Index``; indexing with

``

687

`+

Reindexing operations will return a resulting index based on the type of the passed

`

``

688


indexer. Passing a list will return a plain-old ``Index``; indexing with

685

689

``` a Categorical will return a CategoricalIndex, indexed according to the categories


`686`

``

``` -

of the PASSED ``Categorical`` dtype. This allows one to arbitrarily index these even with

687

``

`-

values NOT in the categories, similarly to how you can reindex ANY pandas index.

`

``

690


of the **passed** ``Categorical`` dtype. This allows one to arbitrarily index these even with

``

691

`+

values not in the categories, similarly to how you can reindex any pandas index.

`

688

692

``

689

693

`.. ipython :: python

`

690

694

``

`@@ -720,7 +724,8 @@ Int64Index and RangeIndex

`

720

724

``

721

725

`` Indexing on an integer-based Index with floats has been clarified in 0.18.0, for a summary of the changes, see :ref:here <whatsnew_0180.float_indexers>.

``

722

726

``

723

``


``Int64Index`` is a fundamental basic index in *pandas*. This is an Immutable array implementing an ordered, sliceable set.

``

727


``Int64Index`` is a fundamental basic index in pandas. 

``

728

`+

This is an Immutable array implementing an ordered, sliceable set.

`

724

729

``` Prior to 0.18.0, the Int64Index would provide the default index for all NDFrame objects.


`725`

`730`

``

`726`

`731`

``` ``RangeIndex`` is a sub-class of ``Int64Index`` added in version 0.18.0, now providing the default index for all ``NDFrame`` objects.

`@@ -742,7 +747,7 @@ same.

`

742

747

`sf = pd.Series(range(5), index=indexf)

`

743

748

` sf

`

744

749

``

745

``


Scalar selection for ``[],.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)

``

750


Scalar selection for ``[],.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``).

746

751

``

747

752

`.. ipython:: python

`

748

753

``

``` @@ -751,30 +756,32 @@ Scalar selection for [],.loc will always be label based. An integer will mat


`751`

`756`

` sf.loc[3]

`

`752`

`757`

` sf.loc[3.0]

`

`753`

`758`

``

`754`

``

``` -

The only positional indexing is via ``iloc``

``

759


The only positional indexing is via ``iloc``.

755

760

``

756

761

`.. ipython:: python

`

757

762

``

758

763

` sf.iloc[3]

`

759

764

``

760

``


A scalar index that is not found will raise ``KeyError``

``

765


A scalar index that is not found will raise a ``KeyError``.

761

766

``

762

``


Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS positional with ``iloc``

``

767


Slicing is primarily on the values of the index when using ``[],ix,loc``, and 

``

768


**always** positional when using ``iloc``. The exception is when the slice is

``

769

`+

boolean, in which case it will always be positional.

`

763

770

``

764

771

`.. ipython:: python

`

765

772

``

766

773

` sf[2:4]

`

767

774

` sf.loc[2:4]

`

768

775

` sf.iloc[2:4]

`

769

776

``

770

``

`-

In float indexes, slicing using floats is allowed

`

``

777

`+

In float indexes, slicing using floats is allowed.

`

771

778

``

772

779

`.. ipython:: python

`

773

780

``

774

781

` sf[2.1:4.6]

`

775

782

` sf.loc[2.1:4.6]

`

776

783

``

777

``


In non-float indexes, slicing using floats will raise a ``TypeError``

``

784


In non-float indexes, slicing using floats will raise a ``TypeError``.

778

785

``

779

786

`.. code-block:: ipython

`

780

787

``

``` @@ -786,7 +793,7 @@ In non-float indexes, slicing using floats will raise a TypeError


`786`

`793`

``

`787`

`794`

`.. warning::

`

`788`

`795`

``

`789`

``

``` -

Using a scalar float indexer for ``.iloc`` has been removed in 0.18.0, so the following will raise a ``TypeError``

``

796


 Using a scalar float indexer for ``.iloc`` has been removed in 0.18.0, so the following will raise a ``TypeError``:

790

797

``

791

798

` .. code-block:: ipython

`

792

799

``

`@@ -816,13 +823,13 @@ Selection operations then will always work on a value basis, for all selection o

`

816

823

` dfir.loc[0:1001,'A']

`

817

824

` dfir.loc[1000.4]

`

818

825

``

819

``

`-

You could then easily pick out the first 1 second (1000 ms) of data then.

`

``

826

`+

You could retrieve the first 1 second (1000 ms) of data as such:

`

820

827

``

821

828

`.. ipython:: python

`

822

829

``

823

830

` dfir[0:1000]

`

824

831

``

825

``


Of course if you need integer based selection, then use ``iloc``

``

832


If you need integer based selection, you should use ``iloc``:

826

833

``

827

834

`.. ipython:: python

`

828

835

``

`@@ -975,6 +982,7 @@ consider the following Series:

`

975

982

` s

`

976

983

``

977

984

``` Suppose we wished to slice from c to e, using integers this would be

```

``

985

`+

accomplished as such:

`

978

986

``

979

987

`.. ipython:: python

`

980

988

``