CLN: series to now inherit from NDFrame by jreback · Pull Request #3482 · pandas-dev/pandas (original) (raw)
Major refactor primarily to make Series inherit from NDFrame
affects #4080, #3862, #816, #3217, #3386, #4463, #4204, #4118 , #4555
Preserves pickle compat
very few tests were changed (and only for compat on return objects)
a few performance enhancements, a couple of regressions (see bottom)
obviously this is a large change in terms of the codebase, but it brings more consistency between series/frame/panel (not all of this is there yet, but future changes are much easier)
Series is now like Frame in that it has a BlockManager (called SingleBlockManager), which holds a block (of any type we support). This introduced some overhead in doing certain operations, which I spent a lot of time optimizing away, further optimizations will come from cythonizing the core/internals, which should be straightforward at this point
Highlites below:
In 0.13.0 there is a major refactor primarily to subclass Series
from NDFrame
,
which is the base class currently for DataFrame
and Panel
, to unify methods
and behaviors. Series formerly subclassed directly from ndarray
.
- Refactor of series.py/frame.py/panel.py to move common code to generic.py
- added
_setup_axes
to created generic NDFrame structures - moved methods
*from_axes,_wrap_array,axes,ix,shape,empty,swapaxes,transpose,pop
*__iter__,keys,__contains__,__len__,__neg__,__invert__
*convert_objects,as_blocks,as_matrix,values
*__getstate__,__setstate__
(though compat remains in frame/panel)
*__getattr__,__setattr__
*_indexed_same,reindex_like,align,where,mask,replace
*filter
(also added axis argument to selectively filter on a different axis)
*reindex,reindex_axis
(which was the biggest change to make generic)
*truncate
(moved to become part ofNDFrame
)
- added
- These are API changes which make
Panel
more consistent withDataFrame
- swapaxes on a Panel with the same axes specified now return a copy
- support attribute access for setting
- filter supports same api as original
DataFrame
filter
- Reindex called with no arguments will now return a copy of the input object
- Series now inherits from
NDFrame
rather than directly fromndarray
.
There are several minor changes that affect the API.- numpy functions that do not support the array interface will now
returnndarrays
rather than series, e.g.np.diff
andnp.where
Series(0.5)
would previously return the scalar0.5
, this is no
longer supported- several methods from frame/series have moved to
NDFrame
(convert_objects,where,mask) TimeSeries
is now an alias forSeries
. the propertyis_time_series
can be used to distinguish (if desired)
- numpy functions that do not support the array interface will now
- Refactor of Sparse objects to use BlockManager
- Created a new block type in internals,
SparseBlock
, which can hold multi-dtypes
and is non-consolidatable.SparseSeries
andSparseDataFrame
now inherit
more methods from there hierarchy (Series/DataFrame), and no longer inherit
fromSparseArray
(which instead is the object of theSparseBlock
) - Sparse suite now supports integration with non-sparse data. Non-float sparse
data is supportable (partially implemented) - Operations on sparse structures within DataFrames should preserve sparseness,
merging type operations will convert to dense (and back to sparse), so might
be somewhat inefficient - enable setitem on
SparseSeries
for boolean/integer/slices SparsePanels
implementation is unchanged (e.g. not using BlockManager, needs work)
- Created a new block type in internals,
- added
ftypes
method to Series/DataFame, similar todtypes
, but indicates
if the underlying is sparse/dense (as well as the dtype) - All
NDFrame
objects now have a_prop_attributes
, which can be used to indcated various
values to propogate to a new object from an existing (e.g. name inSeries
will follow
more automatically now)
Perf changed a bit primarily in groupby where a Series has to be reconstructed in order to be passed to the function (in some cases). I basically pass a Series-like class to the grouped function to see if it doesn't raise, if its ok, then it is used rather than a full Series in order to reduce overhead of the Series creation for each group.
-------------------------------------------------------------------------------
Test name | head[ms] | base[ms] | ratio |
-------------------------------------------------------------------------------
groupby_multi_python | 109.3636 | 78.3370 | 1.3961 |
frame_iteritems | 3.4664 | 2.0154 | 1.7200 |
frame_fancy_lookup | 3.3991 | 1.6137 | 2.1064 |
sparse_frame_constructor | 11.7100 | 5.3363 | 2.1944 |
-------------------------------------------------------------------------------
Test name | head[ms] | base[ms] | ratio |
-------------------------------------------------------------------------------
Target [c5d9495] : BUG: fix ujson handling of new series object
Base [1b91f4f] : BUG: Fixed non-unique indexing memory allocation issue with .ix/.loc (GH4280)