Spellcheck of docs, a few minor changes (#18973) · pandas-dev/pandas@8433562 (original) (raw)
`` @@ -24,9 +24,9 @@ See the :ref:Indexing and Selecting Data <indexing>
for general indexing docum
``
24
24
` Whether a copy or a reference is returned for a setting operation, may
`
25
25
``` depend on the context. This is sometimes called chained assignment
and
`26`
`26`
`` should be avoided. See :ref:`Returning a View versus Copy
``
`27`
``
`` -
<indexing.view_versus_copy>`
``
``
`27`
`` +
<indexing.view_versus_copy>`.
``
`28`
`28`
``
`29`
``
`` -
See the :ref:`cookbook<cookbook.selection>` for some advanced strategies
``
``
`29`
`` +
See the :ref:`cookbook<cookbook.selection>` for some advanced strategies.
``
`30`
`30`
``
`31`
`31`
`.. _advanced.hierarchical:
`
`32`
`32`
``
`` @@ -46,7 +46,7 @@ described above and in prior sections. Later, when discussing :ref:`group by
``
`46`
`46`
`non-trivial applications to illustrate how it aids in structuring data for
`
`47`
`47`
`analysis.
`
`48`
`48`
``
`49`
``
`` -
See the :ref:`cookbook<cookbook.multi_index>` for some advanced strategies
``
``
`49`
`` +
See the :ref:`cookbook<cookbook.multi_index>` for some advanced strategies.
``
`50`
`50`
``
`51`
`51`
`Creating a MultiIndex (hierarchical index) object
`
`52`
`52`
`~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`
``` @@ -59,7 +59,7 @@ can think of ``MultiIndex`` as an array of tuples where each tuple is unique. A
59
59
``` MultiIndex.from_tuples
), or a crossed set of iterables (using
`60`
`60`
``` ``MultiIndex.from_product``). The ``Index`` constructor will attempt to return
61
61
``` a MultiIndex
when it is passed a list of tuples. The following examples
`62`
``
`-
demo different ways to initialize MultiIndexes.
`
``
`62`
`+
demonstrate different ways to initialize MultiIndexes.
`
`63`
`63`
``
`64`
`64`
``
`65`
`65`
`.. ipython:: python
`
`@@ -196,7 +196,8 @@ highly performant. If you want to see the actual used levels.
`
`196`
`196`
`# for a specific level
`
`197`
`197`
` df[['foo','qux']].columns.get_level_values(0)
`
`198`
`198`
``
`199`
``
``` -
To reconstruct the ``MultiIndex`` with only the used levels
``
199
To reconstruct the ``MultiIndex`` with only the used levels, the
``
200
``remove_unused_levels`` method may be used.
200
201
``
201
202
`.. versionadded:: 0.20.0
`
202
203
``
`@@ -216,7 +217,7 @@ tuples:
`
216
217
` s + s[:-2]
`
217
218
` s + s[::2]
`
218
219
``
219
``
``reindex`` can be called with another ``MultiIndex`` or even a list or array
``
220
``reindex`` can be called with another ``MultiIndex``, or even a list or array
220
221
`of tuples:
`
221
222
``
222
223
`.. ipython:: python
`
`@@ -230,7 +231,7 @@ Advanced indexing with hierarchical index
`
230
231
`-----------------------------------------
`
231
232
``
232
233
``` Syntactically integrating MultiIndex
in advanced indexing with .loc
is a
`233`
``
`-
bit challenging, but we've made every effort to do so. for example the
`
``
`234`
`+
bit challenging, but we've made every effort to do so. For example the
`
`234`
`235`
`following works as you would expect:
`
`235`
`236`
``
`236`
`237`
`.. ipython:: python
`
`@@ -286,7 +287,7 @@ As usual, **both sides** of the slicers are included as this is label indexing.
`
`286`
`287`
``
`287`
`288`
` df.loc[(slice('A1','A3'),.....), :]
`
`288`
`289`
``
`289`
``
`-
rather than this:
`
``
`290`
`+
You should **not** do this:
`
`290`
`291`
``
`291`
`292`
` .. code-block:: python
`
`292`
`293`
``
`@@ -315,7 +316,7 @@ Basic multi-index slicing using slices, lists, and labels.
`
`315`
`316`
``
`316`
`317`
` dfmi.loc[(slice('A1','A3'), slice(None), ['C1', 'C3']), :]
`
`317`
`318`
``
`318`
``
``` -
You can use a ``pd.IndexSlice`` to have a more natural syntax using ``:`` rather than using ``slice(None)``
``
319
You can use :class:`pandas.IndexSlice` to facilitate a more natural syntax using ``:``, rather than using ``slice(None)``.
319
320
``
320
321
`.. ipython:: python
`
321
322
``
`@@ -344,7 +345,7 @@ slicers on a single axis.
`
344
345
``
345
346
` dfmi.loc(axis=0)[:, :, ['C1', 'C3']]
`
346
347
``
347
``
`-
Furthermore you can set the values using these methods
`
``
348
`+
Furthermore you can set the values using the following methods.
`
348
349
``
349
350
`.. ipython:: python
`
350
351
``
`@@ -379,7 +380,7 @@ selecting data at a particular level of a MultiIndex easier.
`
379
380
` df.loc[(slice(None),'one'),:]
`
380
381
``
381
382
`` You can also select on the columns with :meth:~pandas.MultiIndex.xs
, by
``
382
``
`-
providing the axis argument
`
``
383
`+
providing the axis argument.
`
383
384
``
384
385
`.. ipython:: python
`
385
386
``
`@@ -391,7 +392,7 @@ providing the axis argument
`
391
392
`# using the slicers
`
392
393
` df.loc[:,(slice(None),'one')]
`
393
394
``
394
``
`` -
:meth:~pandas.MultiIndex.xs
also allows selection with multiple keys
``
``
395
`` +
:meth:~pandas.MultiIndex.xs
also allows selection with multiple keys.
``
395
396
``
396
397
`.. ipython:: python
`
397
398
``
`@@ -403,13 +404,13 @@ providing the axis argument
`
403
404
` df.loc[:,('bar','one')]
`
404
405
``
405
406
``` You can pass drop_level=False
to :meth:~pandas.MultiIndex.xs
to retain
`406`
``
`-
the level that was selected
`
``
`407`
`+
the level that was selected.
`
`407`
`408`
``
`408`
`409`
`.. ipython:: python
`
`409`
`410`
``
`410`
`411`
` df.xs('one', level='second', axis=1, drop_level=False)
`
`411`
`412`
``
`412`
``
``` -
versus the result with ``drop_level=True`` (the default value)
``
413
Compare the above with the result using ``drop_level=True`` (the default value).
413
414
``
414
415
`.. ipython:: python
`
415
416
``
`@@ -470,7 +471,7 @@ allowing you to permute the hierarchical index levels in one step:
`
470
471
`` Sorting a :class:~pandas.MultiIndex
``
471
472
`-------------------------------------
`
472
473
``
473
``
`-
For MultiIndex-ed objects to be indexed & sliced effectively, they need
`
``
474
`+
For MultiIndex-ed objects to be indexed and sliced effectively, they need
`
474
475
``` to be sorted. As with any index, you can use sort_index
.
`475`
`476`
``
`476`
`477`
`.. ipython:: python
`
`@@ -623,7 +624,8 @@ Index Types
`
`623`
`624`
`-----------
`
`624`
`625`
``
`625`
`626`
``` We have discussed ``MultiIndex`` in the previous sections pretty extensively. ``DatetimeIndex`` and ``PeriodIndex``
626
``
are shown :ref:`here <timeseries.overview>`. ``TimedeltaIndex`` are :ref:`here <timedeltas.timedeltas>`.
``
627
`` +
are shown :ref:here <timeseries.overview>
, and information about
``
``
628
`TimedeltaIndex`` is found :ref:`here <timedeltas.timedeltas>`.
627
629
``
628
630
`In the following sub-sections we will highlight some other index types.
`
629
631
``
`@@ -647,44 +649,46 @@ and allows efficient indexing and storage of an index with a large number of dup
`
647
649
` df.dtypes
`
648
650
` df.B.cat.categories
`
649
651
``
650
``
Setting the index, will create a ``CategoricalIndex``
``
652
Setting the index will create a ``CategoricalIndex``.
651
653
``
652
654
`.. ipython:: python
`
653
655
``
654
656
` df2 = df.set_index('B')
`
655
657
` df2.index
`
656
658
``
657
659
``` Indexing with __getitem__/.iloc/.loc
works similarly to an Index
with duplicates.
`658`
``
`-
The indexers MUST be in the category or the operation will raise.
`
``
`660`
``` +
The indexers **must** be in the category or the operation will raise a ``KeyError``.
659
661
``
660
662
`.. ipython:: python
`
661
663
``
662
664
` df2.loc['a']
`
663
665
``
664
``
These PRESERVE the ``CategoricalIndex``
``
666
The ``CategoricalIndex`` is **preserved** after indexing:
665
667
``
666
668
`.. ipython:: python
`
667
669
``
668
670
` df2.loc['a'].index
`
669
671
``
670
``
`-
Sorting will order by the order of the categories
`
``
672
`+
Sorting the index will sort by the order of the categories (Recall that we
`
``
673
created the index with with ``CategoricalDtype(list('cab'))``, so the sorted
``
674
order is ``cab``.).
671
675
``
672
676
`.. ipython:: python
`
673
677
``
674
678
` df2.sort_index()
`
675
679
``
676
``
`-
Groupby operations on the index will preserve the index nature as well
`
``
680
`+
Groupby operations on the index will preserve the index nature as well.
`
677
681
``
678
682
`.. ipython:: python
`
679
683
``
680
684
` df2.groupby(level=0).sum()
`
681
685
` df2.groupby(level=0).sum().index
`
682
686
``
683
``
`-
Reindexing operations, will return a resulting index based on the type of the passed
`
684
``
indexer, meaning that passing a list will return a plain-old-``Index``; indexing with
``
687
`+
Reindexing operations will return a resulting index based on the type of the passed
`
``
688
indexer. Passing a list will return a plain-old ``Index``; indexing with
685
689
``` a Categorical
will return a CategoricalIndex
, indexed according to the categories
`686`
``
``` -
of the PASSED ``Categorical`` dtype. This allows one to arbitrarily index these even with
687
``
`-
values NOT in the categories, similarly to how you can reindex ANY pandas index.
`
``
690
of the **passed** ``Categorical`` dtype. This allows one to arbitrarily index these even with
``
691
`+
values not in the categories, similarly to how you can reindex any pandas index.
`
688
692
``
689
693
`.. ipython :: python
`
690
694
``
`@@ -720,7 +724,8 @@ Int64Index and RangeIndex
`
720
724
``
721
725
`` Indexing on an integer-based Index with floats has been clarified in 0.18.0, for a summary of the changes, see :ref:here <whatsnew_0180.float_indexers>
.
``
722
726
``
723
``
``Int64Index`` is a fundamental basic index in *pandas*. This is an Immutable array implementing an ordered, sliceable set.
``
727
``Int64Index`` is a fundamental basic index in pandas.
``
728
`+
This is an Immutable array implementing an ordered, sliceable set.
`
724
729
``` Prior to 0.18.0, the Int64Index
would provide the default index for all NDFrame
objects.
`725`
`730`
``
`726`
`731`
``` ``RangeIndex`` is a sub-class of ``Int64Index`` added in version 0.18.0, now providing the default index for all ``NDFrame`` objects.
`@@ -742,7 +747,7 @@ same.
`
742
747
`sf = pd.Series(range(5), index=indexf)
`
743
748
` sf
`
744
749
``
745
``
Scalar selection for ``[],.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``)
``
750
Scalar selection for ``[],.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``).
746
751
``
747
752
`.. ipython:: python
`
748
753
``
``` @@ -751,30 +756,32 @@ Scalar selection for [],.loc
will always be label based. An integer will mat
`751`
`756`
` sf.loc[3]
`
`752`
`757`
` sf.loc[3.0]
`
`753`
`758`
``
`754`
``
``` -
The only positional indexing is via ``iloc``
``
759
The only positional indexing is via ``iloc``.
755
760
``
756
761
`.. ipython:: python
`
757
762
``
758
763
` sf.iloc[3]
`
759
764
``
760
``
A scalar index that is not found will raise ``KeyError``
``
765
A scalar index that is not found will raise a ``KeyError``.
761
766
``
762
``
Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS positional with ``iloc``
``
767
Slicing is primarily on the values of the index when using ``[],ix,loc``, and
``
768
**always** positional when using ``iloc``. The exception is when the slice is
``
769
`+
boolean, in which case it will always be positional.
`
763
770
``
764
771
`.. ipython:: python
`
765
772
``
766
773
` sf[2:4]
`
767
774
` sf.loc[2:4]
`
768
775
` sf.iloc[2:4]
`
769
776
``
770
``
`-
In float indexes, slicing using floats is allowed
`
``
777
`+
In float indexes, slicing using floats is allowed.
`
771
778
``
772
779
`.. ipython:: python
`
773
780
``
774
781
` sf[2.1:4.6]
`
775
782
` sf.loc[2.1:4.6]
`
776
783
``
777
``
In non-float indexes, slicing using floats will raise a ``TypeError``
``
784
In non-float indexes, slicing using floats will raise a ``TypeError``.
778
785
``
779
786
`.. code-block:: ipython
`
780
787
``
``` @@ -786,7 +793,7 @@ In non-float indexes, slicing using floats will raise a TypeError
`786`
`793`
``
`787`
`794`
`.. warning::
`
`788`
`795`
``
`789`
``
``` -
Using a scalar float indexer for ``.iloc`` has been removed in 0.18.0, so the following will raise a ``TypeError``
``
796
Using a scalar float indexer for ``.iloc`` has been removed in 0.18.0, so the following will raise a ``TypeError``:
790
797
``
791
798
` .. code-block:: ipython
`
792
799
``
`@@ -816,13 +823,13 @@ Selection operations then will always work on a value basis, for all selection o
`
816
823
` dfir.loc[0:1001,'A']
`
817
824
` dfir.loc[1000.4]
`
818
825
``
819
``
`-
You could then easily pick out the first 1 second (1000 ms) of data then.
`
``
826
`+
You could retrieve the first 1 second (1000 ms) of data as such:
`
820
827
``
821
828
`.. ipython:: python
`
822
829
``
823
830
` dfir[0:1000]
`
824
831
``
825
``
Of course if you need integer based selection, then use ``iloc``
``
832
If you need integer based selection, you should use ``iloc``:
826
833
``
827
834
`.. ipython:: python
`
828
835
``
`@@ -975,6 +982,7 @@ consider the following Series:
`
975
982
` s
`
976
983
``
977
984
``` Suppose we wished to slice from c
to e
, using integers this would be
```
``
985
`+
accomplished as such:
`
978
986
``
979
987
`.. ipython:: python
`
980
988
``