DOC: add reshaping visuals to the docs (Reshaping and Pivot Tables) (… · pandas-dev/pandas@b3f07b2 (original) (raw)
``` @@ -60,6 +60,8 @@ To select out everything for variable A
we could do:
`60`
`60`
``
`61`
`61`
` df[df['variable'] == 'A']
`
`62`
`62`
``
``
`63`
`+
.. image:: _static/reshaping_pivot.png
`
``
`64`
`+`
`63`
`65`
`But suppose we wish to do time series operations with the variables. A better
`
`64`
`66`
``` representation would be where the ``columns`` are the unique variables and an
65
67
``` index
of dates identifies individual observations. To reshape the data into
`@@ -96,10 +98,12 @@ are homogeneously-typed.
`
`96`
`98`
`Reshaping by stacking and unstacking
`
`97`
`99`
`------------------------------------
`
`98`
`100`
``
`99`
``
`` -
Closely related to the :meth:`~DataFrame.pivot` method are the related
``
`100`
``
`` -
:meth:`~DataFrame.stack` and :meth:`~DataFrame.unstack` methods available on
``
`101`
``
``` -
``Series`` and ``DataFrame``. These methods are designed to work together with
102
``
``MultiIndex`` objects (see the section on :ref:`hierarchical indexing
``
101
`+
.. image:: _static/reshaping_stack.png
`
``
102
+
``
103
`` +
Closely related to the :meth:~DataFrame.pivot
method are the related
``
``
104
`` +
:meth:~DataFrame.stack
and :meth:~DataFrame.unstack
methods available on
``
``
105
``Series`` and ``DataFrame``. These methods are designed to work together with
``
106
``MultiIndex`` objects (see the section on :ref:`hierarchical indexing
103
107
`` <advanced.hierarchical>`). Here are essentially what these methods do:
``
104
108
``
105
109
``` - stack
: "pivot" a level of the (possibly hierarchical) column labels,
`` @@ -109,6 +113,8 @@ Closely related to the :meth:`~DataFrame.pivot` method are the related
``
`109`
`113`
` (possibly hierarchical) row index to the column axis, producing a reshaped
`
`110`
`114`
``` ``DataFrame`` with a new inner-most level of column labels.
111
115
``
``
116
`+
.. image:: _static/reshaping_unstack.png
`
``
117
+
112
118
`The clearest way to explain is by example. Let's take a prior example data set
`
113
119
`from the hierarchical indexing section:
`
114
120
``
`@@ -149,13 +155,18 @@ unstacks the last level:
`
149
155
``
150
156
`.. _reshaping.unstack_by_name:
`
151
157
``
``
158
`+
.. image:: _static/reshaping_unstack_1.png
`
``
159
+
152
160
`If the indexes have names, you can use the level names instead of specifying
`
153
161
`the level numbers:
`
154
162
``
155
163
`.. ipython:: python
`
156
164
``
157
165
` stacked.unstack('second')
`
158
166
``
``
167
+
``
168
`+
.. image:: _static/reshaping_unstack_0.png
`
``
169
+
159
170
``` Notice that the stack
and unstack
methods implicitly sort the index
`160`
`171`
``` levels involved. Hence a call to ``stack`` and then ``unstack``, or vice versa,
161
172
``` will result in a **sorted** copy of the original DataFrame
or Series
:
`@@ -266,11 +277,13 @@ the right thing:
`
`266`
`277`
`Reshaping by Melt
`
`267`
`278`
`-----------------
`
`268`
`279`
``
``
`280`
`+
.. image:: _static/reshaping_melt.png
`
``
`281`
`+`
`269`
`282`
`` The top-level :func:`~pandas.melt` function and the corresponding :meth:`DataFrame.melt`
``
`270`
``
``` -
are useful to massage a ``DataFrame`` into a format where one or more columns
271
``
`-
are identifier variables, while all other columns, considered *measured
`
272
``
`-
variables*, are "unpivoted" to the row axis, leaving just two non-identifier
`
273
``
`-
columns, "variable" and "value". The names of those columns can be customized
`
``
283
are useful to massage a ``DataFrame`` into a format where one or more columns
``
284
`+
are identifier variables, while all other columns, considered *measured
`
``
285
`+
variables*, are "unpivoted" to the row axis, leaving just two non-identifier
`
``
286
`+
columns, "variable" and "value". The names of those columns can be customized
`
274
287
``` by supplying the var_name
and value_name
parameters.
`275`
`288`
``
`276`
`289`
`For instance,
`
`@@ -285,7 +298,7 @@ For instance,
`
`285`
`298`
` cheese.melt(id_vars=['first', 'last'])
`
`286`
`299`
` cheese.melt(id_vars=['first', 'last'], var_name='quantity')
`
`287`
`300`
``
`288`
``
`` -
Another way to transform is to use the :func:`~pandas.wide_to_long` panel data
``
``
`301`
`` +
Another way to transform is to use the :func:`~pandas.wide_to_long` panel data
``
`289`
`302`
`` convenience function. It is less flexible than :func:`~pandas.melt`, but more
``
`290`
`303`
`user-friendly.
`
`291`
`304`
``
`` @@ -332,8 +345,8 @@ While :meth:`~DataFrame.pivot` provides general purpose pivoting with various
``
`332`
`345`
`` data types (strings, numerics, etc.), pandas also provides :func:`~pandas.pivot_table`
``
`333`
`346`
`for pivoting with aggregation of numeric data.
`
`334`
`347`
``
`335`
``
`` -
The function :func:`~pandas.pivot_table` can be used to create spreadsheet-style
``
`336`
``
`` -
pivot tables. See the :ref:`cookbook<cookbook.pivot>` for some advanced
``
``
`348`
`` +
The function :func:`~pandas.pivot_table` can be used to create spreadsheet-style
``
``
`349`
`` +
pivot tables. See the :ref:`cookbook<cookbook.pivot>` for some advanced
``
`337`
`350`
`strategies.
`
`338`
`351`
``
`339`
`352`
`It takes a number of arguments:
`
``` @@ -485,7 +498,7 @@ using the ``normalize`` argument:
485
498
` pd.crosstab(df.A, df.B, normalize='columns')
`
486
499
``
487
500
``` crosstab
can also be passed a third Series
and an aggregation function
`488`
``
``` -
(``aggfunc``) that will be applied to the values of the third ``Series`` within
``
501
(``aggfunc``) that will be applied to the values of the third ``Series`` within
489
502
``` each group defined by the first two Series
:
`490`
`503`
``
`491`
`504`
`.. ipython:: python
`
`@@ -508,8 +521,8 @@ Finally, one can also add margins or normalize this output.
`
`508`
`521`
`Tiling
`
`509`
`522`
`------
`
`510`
`523`
``
`511`
``
`` -
The :func:`~pandas.cut` function computes groupings for the values of the input
``
`512`
``
`-
array and is often used to transform continuous variables to discrete or
`
``
`524`
`` +
The :func:`~pandas.cut` function computes groupings for the values of the input
``
``
`525`
`+
array and is often used to transform continuous variables to discrete or
`
`513`
`526`
`categorical variables:
`
`514`
`527`
``
`515`
`528`
`.. ipython:: python
`
`@@ -539,8 +552,8 @@ used to bin the passed data.::
`
`539`
`552`
`Computing indicator / dummy variables
`
`540`
`553`
`-------------------------------------
`
`541`
`554`
``
`542`
``
``` -
To convert a categorical variable into a "dummy" or "indicator" ``DataFrame``,
543
``
for example a column in a ``DataFrame`` (a ``Series``) which has ``k`` distinct
``
555
To convert a categorical variable into a "dummy" or "indicator" ``DataFrame``,
``
556
for example a column in a ``DataFrame`` (a ``Series``) which has ``k`` distinct
544
557
``` values, can derive a DataFrame
containing k
columns of 1s and 0s using
`545`
`558`
`` :func:`~pandas.get_dummies`:
``
`546`
`559`
``
``` @@ -577,7 +590,7 @@ This function is often used along with discretization functions like ``cut``:
577
590
`` See also :func:Series.str.get_dummies <pandas.Series.str.get_dummies>
.
``
578
591
``
579
592
``` :func:get_dummies
also accepts a DataFrame
. By default all categorical
`580`
``
`` -
variables (categorical in the statistical sense, those with `object` or
``
``
`593`
`` +
variables (categorical in the statistical sense, those with `object` or
``
`581`
`594`
`` `categorical` dtype) are encoded as dummy variables.
``
`582`
`595`
``
`583`
`596`
``
`` @@ -587,7 +600,7 @@ variables (categorical in the statistical sense, those with `object` or
``
`587`
`600`
`'C': [1, 2, 3]})
`
`588`
`601`
` pd.get_dummies(df)
`
`589`
`602`
``
`590`
``
`-
All non-object columns are included untouched in the output. You can control
`
``
`603`
`+
All non-object columns are included untouched in the output. You can control
`
`591`
`604`
``` the columns that are encoded with the ``columns`` keyword.
592
605
``
593
606
`.. ipython:: python
`
`@@ -640,7 +653,7 @@ When a column contains only one level, it will be omitted in the result.
`
640
653
``
641
654
` pd.get_dummies(df, drop_first=True)
`
642
655
``
643
``
By default new columns will have ``np.uint8`` dtype.
``
656
By default new columns will have ``np.uint8`` dtype.
644
657
``` To choose another dtype, use thedtype
argument:
```
645
658
``
646
659
`.. ipython:: python
`