DOC: add reshaping visuals to the docs (Reshaping and Pivot Tables) (… · pandas-dev/pandas@b3f07b2 (original) (raw)

``` @@ -60,6 +60,8 @@ To select out everything for variable A we could do:


`60`

`60`

``

`61`

`61`

` df[df['variable'] == 'A']

`

`62`

`62`

``

``

`63`

`+

.. image:: _static/reshaping_pivot.png

`

``

`64`

`+`

`63`

`65`

`But suppose we wish to do time series operations with the variables. A better

`

`64`

`66`

``` representation would be where the ``columns`` are the unique variables and an

65

67

``` index of dates identifies individual observations. To reshape the data into


`@@ -96,10 +98,12 @@ are homogeneously-typed.

`

`96`

`98`

`Reshaping by stacking and unstacking

`

`97`

`99`

`------------------------------------

`

`98`

`100`

``

`99`

``

`` -

Closely related to the :meth:`~DataFrame.pivot` method are the related 

``

`100`

``

`` -

:meth:`~DataFrame.stack` and :meth:`~DataFrame.unstack` methods available on 

``

`101`

``

``` -

``Series`` and ``DataFrame``. These methods are designed to work together with 

102

``


``MultiIndex`` objects (see the section on :ref:`hierarchical indexing 

``

101

`+

.. image:: _static/reshaping_stack.png

`

``

102

+

``

103

`` +

Closely related to the :meth:~DataFrame.pivot method are the related

``

``

104

`` +

:meth:~DataFrame.stack and :meth:~DataFrame.unstack methods available on

``

``

105


``Series`` and ``DataFrame``. These methods are designed to work together with

``

106


``MultiIndex`` objects (see the section on :ref:`hierarchical indexing

103

107

`` <advanced.hierarchical>`). Here are essentially what these methods do:

``

104

108

``

105

109

``` - stack: "pivot" a level of the (possibly hierarchical) column labels,


`` @@ -109,6 +113,8 @@ Closely related to the :meth:`~DataFrame.pivot` method are the related

``

`109`

`113`

` (possibly hierarchical) row index to the column axis, producing a reshaped

`

`110`

`114`

```  ``DataFrame`` with a new inner-most level of column labels.

111

115

``

``

116

`+

.. image:: _static/reshaping_unstack.png

`

``

117

+

112

118

`The clearest way to explain is by example. Let's take a prior example data set

`

113

119

`from the hierarchical indexing section:

`

114

120

``

`@@ -149,13 +155,18 @@ unstacks the last level:

`

149

155

``

150

156

`.. _reshaping.unstack_by_name:

`

151

157

``

``

158

`+

.. image:: _static/reshaping_unstack_1.png

`

``

159

+

152

160

`If the indexes have names, you can use the level names instead of specifying

`

153

161

`the level numbers:

`

154

162

``

155

163

`.. ipython:: python

`

156

164

``

157

165

` stacked.unstack('second')

`

158

166

``

``

167

+

``

168

`+

.. image:: _static/reshaping_unstack_0.png

`

``

169

+

159

170

``` Notice that the stack and unstack methods implicitly sort the index


`160`

`171`

``` levels involved. Hence a call to ``stack`` and then ``unstack``, or vice versa,

161

172

``` will result in a **sorted** copy of the original DataFrame or Series:


`@@ -266,11 +277,13 @@ the right thing:

`

`266`

`277`

`Reshaping by Melt

`

`267`

`278`

`-----------------

`

`268`

`279`

``

``

`280`

`+

.. image:: _static/reshaping_melt.png

`

``

`281`

`+`

`269`

`282`

`` The top-level :func:`~pandas.melt` function and the corresponding :meth:`DataFrame.melt`

``

`270`

``

``` -

are useful to massage a ``DataFrame`` into a format where one or more columns 

271

``

`-

are identifier variables, while all other columns, considered *measured

`

272

``

`-

variables*, are "unpivoted" to the row axis, leaving just two non-identifier

`

273

``

`-

columns, "variable" and "value". The names of those columns can be customized

`

``

283


are useful to massage a ``DataFrame`` into a format where one or more columns

``

284

`+

are identifier variables, while all other columns, considered *measured

`

``

285

`+

variables*, are "unpivoted" to the row axis, leaving just two non-identifier

`

``

286

`+

columns, "variable" and "value". The names of those columns can be customized

`

274

287

``` by supplying the var_name and value_name parameters.


`275`

`288`

``

`276`

`289`

`For instance,

`

`@@ -285,7 +298,7 @@ For instance,

`

`285`

`298`

` cheese.melt(id_vars=['first', 'last'])

`

`286`

`299`

` cheese.melt(id_vars=['first', 'last'], var_name='quantity')

`

`287`

`300`

``

`288`

``

`` -

Another way to transform is to use the :func:`~pandas.wide_to_long` panel data 

``

``

`301`

`` +

Another way to transform is to use the :func:`~pandas.wide_to_long` panel data

``

`289`

`302`

`` convenience function. It is less flexible than :func:`~pandas.melt`, but more

``

`290`

`303`

`user-friendly.

`

`291`

`304`

``

`` @@ -332,8 +345,8 @@ While :meth:`~DataFrame.pivot` provides general purpose pivoting with various

``

`332`

`345`

`` data types (strings, numerics, etc.), pandas also provides :func:`~pandas.pivot_table`

``

`333`

`346`

`for pivoting with aggregation of numeric data.

`

`334`

`347`

``

`335`

``

`` -

The function :func:`~pandas.pivot_table` can be used to create spreadsheet-style 

``

`336`

``

`` -

pivot tables. See the :ref:`cookbook<cookbook.pivot>` for some advanced 

``

``

`348`

`` +

The function :func:`~pandas.pivot_table` can be used to create spreadsheet-style

``

``

`349`

`` +

pivot tables. See the :ref:`cookbook<cookbook.pivot>` for some advanced

``

`337`

`350`

`strategies.

`

`338`

`351`

``

`339`

`352`

`It takes a number of arguments:

`

``` @@ -485,7 +498,7 @@ using the ``normalize`` argument:

485

498

` pd.crosstab(df.A, df.B, normalize='columns')

`

486

499

``

487

500

``` crosstab can also be passed a third Series and an aggregation function


`488`

``

``` -

(``aggfunc``) that will be applied to the values of the third ``Series`` within 

``

501


(``aggfunc``) that will be applied to the values of the third ``Series`` within

489

502

``` each group defined by the first two Series:


`490`

`503`

``

`491`

`504`

`.. ipython:: python

`

`@@ -508,8 +521,8 @@ Finally, one can also add margins or normalize this output.

`

`508`

`521`

`Tiling

`

`509`

`522`

`------

`

`510`

`523`

``

`511`

``

`` -

The :func:`~pandas.cut` function computes groupings for the values of the input 

``

`512`

``

`-

array and is often used to transform continuous variables to discrete or 

`

``

`524`

`` +

The :func:`~pandas.cut` function computes groupings for the values of the input

``

``

`525`

`+

array and is often used to transform continuous variables to discrete or

`

`513`

`526`

`categorical variables:

`

`514`

`527`

``

`515`

`528`

`.. ipython:: python

`

`@@ -539,8 +552,8 @@ used to bin the passed data.::

`

`539`

`552`

`Computing indicator / dummy variables

`

`540`

`553`

`-------------------------------------

`

`541`

`554`

``

`542`

``

``` -

To convert a categorical variable into a "dummy" or "indicator" ``DataFrame``, 

543

``


for example a column in a ``DataFrame`` (a ``Series``) which has ``k`` distinct 

``

555


To convert a categorical variable into a "dummy" or "indicator" ``DataFrame``,

``

556


for example a column in a ``DataFrame`` (a ``Series``) which has ``k`` distinct

544

557

``` values, can derive a DataFrame containing k columns of 1s and 0s using


`545`

`558`

`` :func:`~pandas.get_dummies`:

``

`546`

`559`

``

``` @@ -577,7 +590,7 @@ This function is often used along with discretization functions like ``cut``:

577

590

`` See also :func:Series.str.get_dummies <pandas.Series.str.get_dummies>.

``

578

591

``

579

592

``` :func:get_dummies also accepts a DataFrame. By default all categorical


`580`

``

`` -

variables (categorical in the statistical sense, those with `object` or 

``

``

`593`

`` +

variables (categorical in the statistical sense, those with `object` or

``

`581`

`594`

`` `categorical` dtype) are encoded as dummy variables.

``

`582`

`595`

``

`583`

`596`

``

`` @@ -587,7 +600,7 @@ variables (categorical in the statistical sense, those with `object` or

``

`587`

`600`

`'C': [1, 2, 3]})

`

`588`

`601`

` pd.get_dummies(df)

`

`589`

`602`

``

`590`

``

`-

All non-object columns are included untouched in the output. You can control 

`

``

`603`

`+

All non-object columns are included untouched in the output. You can control

`

`591`

`604`

``` the columns that are encoded with the ``columns`` keyword.

592

605

``

593

606

`.. ipython:: python

`

`@@ -640,7 +653,7 @@ When a column contains only one level, it will be omitted in the result.

`

640

653

``

641

654

` pd.get_dummies(df, drop_first=True)

`

642

655

``

643

``


By default new columns will have ``np.uint8`` dtype. 

``

656


By default new columns will have ``np.uint8`` dtype.

644

657

``` To choose another dtype, use thedtype argument:

```

645

658

``

646

659

`.. ipython:: python

`