Spellcheck of docs, a few minor changes (#18973) · pandas-dev/pandas@8433562 (original) (raw)

`` @@ -24,9 +24,9 @@ See the :ref:Indexing and Selecting Data <indexing> for general indexing docum

24

` Whether a copy or a reference is returned for a setting operation, may

25

``` depend on the context. This is sometimes called chained assignment and


`26`

`26`

``  should be avoided. See :ref:`Returning a View versus Copy

``

`27`

``

`` -

<indexing.view_versus_copy>`

``

``

`27`

`` +

<indexing.view_versus_copy>`.

``

`28`

`28`

``

`29`

``

`` -

See the :ref:`cookbook<cookbook.selection>` for some advanced strategies

``

``

`29`

`` +

See the :ref:`cookbook<cookbook.selection>` for some advanced strategies.

``

`30`

`30`

``

`31`

`31`

`.. _advanced.hierarchical:

`

`32`

`32`

``

`` @@ -46,7 +46,7 @@ described above and in prior sections. Later, when discussing :ref:`group by

``

`46`

`46`

`non-trivial applications to illustrate how it aids in structuring data for

`

`47`

`47`

`analysis.

`

`48`

`48`

``

`49`

``

`` -

See the :ref:`cookbook<cookbook.multi_index>` for some advanced strategies

``

``

`49`

`` +

See the :ref:`cookbook<cookbook.multi_index>` for some advanced strategies.

``

`50`

`50`

``

`51`

`51`

`Creating a MultiIndex (hierarchical index) object

`

`52`

`52`

`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`

``` @@ -59,7 +59,7 @@ can think of ``MultiIndex`` as an array of tuples where each tuple is unique. A

59

``` MultiIndex.from_tuples), or a crossed set of iterables (using


`60`

`60`

``` ``MultiIndex.from_product``). The ``Index`` constructor will attempt to return

61

``` a MultiIndex when it is passed a list of tuples. The following examples


`62`

``

`-

demo different ways to initialize MultiIndexes.

`

``

`62`

`+

demonstrate different ways to initialize MultiIndexes.

`

`63`

`63`

``

`64`

`64`

``

`65`

`65`

`.. ipython:: python

`

`@@ -196,7 +196,8 @@ highly performant. If you want to see the actual used levels.

`

`196`

`196`

`# for a specific level

`

`197`

`197`

` df[['foo','qux']].columns.get_level_values(0)

`

`198`

`198`

``

`199`

``

``` -

To reconstruct the ``MultiIndex`` with only the used levels

199


To reconstruct the ``MultiIndex`` with only the used levels, the

200


``remove_unused_levels`` method may be used.

200

201

202

`.. versionadded:: 0.20.0

202

203

`@@ -216,7 +217,7 @@ tuples:

216

217

` s + s[:-2]

217

218

` s + s[::2]

218

219


``reindex`` can be called with another ``MultiIndex`` or even a list or array

220


``reindex`` can be called with another ``MultiIndex``, or even a list or array

220

221

`of tuples:

221

222

223

`.. ipython:: python

`@@ -230,7 +231,7 @@ Advanced indexing with hierarchical index

230

231

`-----------------------------------------

231

232

233

``` Syntactically integrating MultiIndex in advanced indexing with .loc is a


`233`

``

`-

bit challenging, but we've made every effort to do so. for example the

`

``

`234`

`+

bit challenging, but we've made every effort to do so. For example the

`

`234`

`235`

`following works as you would expect:

`

`235`

`236`

``

`236`

`237`

`.. ipython:: python

`

`@@ -286,7 +287,7 @@ As usual, **both sides** of the slicers are included as this is label indexing.

`

`286`

`287`

``

`287`

`288`

` df.loc[(slice('A1','A3'),.....), :]

`

`288`

`289`

``

`289`

``

`-

rather than this:

`

``

`290`

`+

You should **not** do this:

`

`290`

`291`

``

`291`

`292`

` .. code-block:: python

`

`292`

`293`

``

`@@ -315,7 +316,7 @@ Basic multi-index slicing using slices, lists, and labels.

`

`315`

`316`

``

`316`

`317`

` dfmi.loc[(slice('A1','A3'), slice(None), ['C1', 'C3']), :]

`

`317`

`318`

``

`318`

``

``` -

You can use a ``pd.IndexSlice`` to have a more natural syntax using ``:`` rather than using ``slice(None)``

319


You can use :class:`pandas.IndexSlice` to facilitate a more natural syntax using ``:``, rather than using ``slice(None)``.

319

320

321

`.. ipython:: python

321

322

`@@ -344,7 +345,7 @@ slicers on a single axis.

344

345

346

` dfmi.loc(axis=0)[:, :, ['C1', 'C3']]

346

347

Furthermore you can set the values using these methods

348

Furthermore you can set the values using the following methods.

348

349

350

`.. ipython:: python

350

351

`@@ -379,7 +380,7 @@ selecting data at a particular level of a MultiIndex easier.

379

380

` df.loc[(slice(None),'one'),:]

380

381

382

`` You can also select on the columns with :meth:~pandas.MultiIndex.xs, by

382

providing the axis argument

383

providing the axis argument.

383

384

385

`.. ipython:: python

385

386

`@@ -391,7 +392,7 @@ providing the axis argument

391

392

`# using the slicers

392

393

` df.loc[:,(slice(None),'one')]

393

394

`` -

:meth:~pandas.MultiIndex.xs also allows selection with multiple keys

395

`` +

:meth:~pandas.MultiIndex.xs also allows selection with multiple keys.

395

396

397

`.. ipython:: python

397

398

`@@ -403,13 +404,13 @@ providing the axis argument

403

404

` df.loc[:,('bar','one')]

404

405

406

``` You can pass drop_level=False to :meth:~pandas.MultiIndex.xs to retain


`406`

``

`-

the level that was selected

`

``

`407`

`+

the level that was selected.

`

`407`

`408`

``

`408`

`409`

`.. ipython:: python

`

`409`

`410`

``

`410`

`411`

` df.xs('one', level='second', axis=1, drop_level=False)

`

`411`

`412`

``

`412`

``

``` -

versus the result with ``drop_level=True`` (the default value)

413


Compare the above with the result using ``drop_level=True`` (the default value).

413

414

415

`.. ipython:: python

415

416

`@@ -470,7 +471,7 @@ allowing you to permute the hierarchical index levels in one step:

470

471

`` Sorting a :class:~pandas.MultiIndex

471

472

`-------------------------------------

472

473

For MultiIndex-ed objects to be indexed & sliced effectively, they need

474

For MultiIndex-ed objects to be indexed and sliced effectively, they need

474

475

``` to be sorted. As with any index, you can use sort_index.


`475`

`476`

``

`476`

`477`

`.. ipython:: python

`

`@@ -623,7 +624,8 @@ Index Types

`

`623`

`624`

`-----------

`

`624`

`625`

``

`625`

`626`

``` We have discussed ``MultiIndex`` in the previous sections pretty extensively. ``DatetimeIndex`` and ``PeriodIndex``

626


are shown :ref:`here <timeseries.overview>`. ``TimedeltaIndex`` are :ref:`here <timedeltas.timedeltas>`.

627

`` +

are shown :ref:here <timeseries.overview>, and information about

628


`TimedeltaIndex`` is found :ref:`here <timedeltas.timedeltas>`.

627

629

628

630

`In the following sub-sections we will highlight some other index types.

629

631

`@@ -647,44 +649,46 @@ and allows efficient indexing and storage of an index with a large number of dup

647

649

` df.dtypes

648

650

` df.B.cat.categories

649

651

650


Setting the index, will create a ``CategoricalIndex``

652


Setting the index will create a ``CategoricalIndex``.

651

653

652

654

`.. ipython:: python

653

655

654

656

` df2 = df.set_index('B')

655

657

` df2.index

656

658

657

659

``` Indexing with __getitem__/.iloc/.loc works similarly to an Index with duplicates.


`658`

``

`-

The indexers MUST be in the category or the operation will raise.

`

``

`660`

``` +

The indexers **must** be in the category or the operation will raise a ``KeyError``.

659

661

660

662

`.. ipython:: python

661

663

662

664

` df2.loc['a']

663

665

664


These PRESERVE the ``CategoricalIndex``

666


The ``CategoricalIndex`` is **preserved** after indexing:

665

667

666

668

`.. ipython:: python

667

669

668

670

` df2.loc['a'].index

669

671

670

Sorting will order by the order of the categories

672

Sorting the index will sort by the order of the categories (Recall that we

673


created the index with with ``CategoricalDtype(list('cab'))``, so the sorted

674


order is ``cab``.).

671

675

672

676

`.. ipython:: python

673

677

674

678

` df2.sort_index()

675

679

676

Groupby operations on the index will preserve the index nature as well

680

Groupby operations on the index will preserve the index nature as well.

677

681

678

682

`.. ipython:: python

679

683

680

684

` df2.groupby(level=0).sum()

681

685

` df2.groupby(level=0).sum().index

682

686

683

Reindexing operations, will return a resulting index based on the type of the passed

684


indexer, meaning that passing a list will return a plain-old-``Index``; indexing with

687

Reindexing operations will return a resulting index based on the type of the passed

688


indexer. Passing a list will return a plain-old ``Index``; indexing with

685

689

``` a Categorical will return a CategoricalIndex, indexed according to the categories


`686`

``

``` -

of the PASSED ``Categorical`` dtype. This allows one to arbitrarily index these even with

687

values NOT in the categories, similarly to how you can reindex ANY pandas index.

690


of the **passed** ``Categorical`` dtype. This allows one to arbitrarily index these even with

691

values not in the categories, similarly to how you can reindex any pandas index.

688

692

689

693

`.. ipython :: python

690

694

`@@ -720,7 +724,8 @@ Int64Index and RangeIndex

720

724

721

725

`` Indexing on an integer-based Index with floats has been clarified in 0.18.0, for a summary of the changes, see :ref:here <whatsnew_0180.float_indexers>.

722

726

723


``Int64Index`` is a fundamental basic index in *pandas*. This is an Immutable array implementing an ordered, sliceable set.

727


``Int64Index`` is a fundamental basic index in pandas.

728

This is an Immutable array implementing an ordered, sliceable set.

724

729

``` Prior to 0.18.0, the Int64Index would provide the default index for all NDFrame objects.


`725`

`730`

``

`726`

`731`

``` ``RangeIndex`` is a sub-class of ``Int64Index`` added in version 0.18.0, now providing the default index for all ``NDFrame`` objects.

`@@ -742,7 +747,7 @@ same.

742

747

`sf = pd.Series(range(5), index=indexf)

743

748

` sf

744

749

745


Scalar selection for ``[],.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)

750


Scalar selection for ``[],.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``).

746

751

747

752

`.. ipython:: python

748

753

``` @@ -751,30 +756,32 @@ Scalar selection for [],.loc will always be label based. An integer will mat


`751`

`756`

` sf.loc[3]

`

`752`

`757`

` sf.loc[3.0]

`

`753`

`758`

``

`754`

``

``` -

The only positional indexing is via ``iloc``

759


The only positional indexing is via ``iloc``.

755

760

756

761

`.. ipython:: python

757

762

758

763

` sf.iloc[3]

759

764

760


A scalar index that is not found will raise ``KeyError``

765


A scalar index that is not found will raise a ``KeyError``.

761

766

762


Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS positional with ``iloc``

767


Slicing is primarily on the values of the index when using ``[],ix,loc``, and

768


**always** positional when using ``iloc``. The exception is when the slice is

769

boolean, in which case it will always be positional.

763

770

764

771

`.. ipython:: python

765

772

766

773

` sf[2:4]

767

774

` sf.loc[2:4]

768

775

` sf.iloc[2:4]

769

776

770

In float indexes, slicing using floats is allowed

777

In float indexes, slicing using floats is allowed.

771

778

772

779

`.. ipython:: python

773

780

774

781

` sf[2.1:4.6]

775

782

` sf.loc[2.1:4.6]

776

783

777


In non-float indexes, slicing using floats will raise a ``TypeError``

784


In non-float indexes, slicing using floats will raise a ``TypeError``.

778

785

779

786

`.. code-block:: ipython

780

787

``` @@ -786,7 +793,7 @@ In non-float indexes, slicing using floats will raise a TypeError


`786`

`793`

``

`787`

`794`

`.. warning::

`

`788`

`795`

``

`789`

``

``` -

Using a scalar float indexer for ``.iloc`` has been removed in 0.18.0, so the following will raise a ``TypeError``

796


 Using a scalar float indexer for ``.iloc`` has been removed in 0.18.0, so the following will raise a ``TypeError``:

790

797

791

798

` .. code-block:: ipython

792

799

`@@ -816,13 +823,13 @@ Selection operations then will always work on a value basis, for all selection o

816

823

` dfir.loc[0:1001,'A']

817

824

` dfir.loc[1000.4]

818

825

819

You could then easily pick out the first 1 second (1000 ms) of data then.

826

You could retrieve the first 1 second (1000 ms) of data as such:

820

827

821

828

`.. ipython:: python

822

829

823

830

` dfir[0:1000]

824

831

825


Of course if you need integer based selection, then use ``iloc``

832


If you need integer based selection, you should use ``iloc``:

826

833

827

834

`.. ipython:: python

828

835

`@@ -975,6 +982,7 @@ consider the following Series:

975

982

` s

976

983

977

984

``` Suppose we wished to slice from c to e, using integers this would be

```

985

accomplished as such:

978

986

979

987

`.. ipython:: python

980

988