Internals — pandas 0.24.2 documentation (original) (raw)

This section will provide a look into some of pandas internals. It’s primarily intended for developers of pandas itself.

Indexing

In pandas there are a few objects implemented which can serve as valid containers for the axis labels:

There are functions that make the creation of a regular index easy:

The motivation for having an Index class in the first place was to enable different implementations of indexing. This means that it’s possible for you, the user, to implement a custom Index subclass that may be better suited to a particular application than the ones provided in pandas.

From an internal implementation point of view, the relevant methods that anIndex must define are one or more of the following (depending on how incompatible the new object internals are with the Index functions):

MultiIndex

Internally, the MultiIndex consists of a few things: the levels, the integer codes (until version 0.24 named labels), and the level names:

In [1]: index = pd.MultiIndex.from_product([range(3), ['one', 'two']], ...: names=['first', 'second']) ...:

In [2]: index Out[2]: MultiIndex(levels=[[0, 1, 2], ['one', 'two']], codes=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]], names=['first', 'second'])

In [3]: index.levels Out[3]: FrozenList([[0, 1, 2], ['one', 'two']])

In [4]: index.codes Out[4]: FrozenList([[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])

In [5]: index.names Out[5]: FrozenList(['first', 'second'])

You can probably guess that the codes determine which unique element is identified with that location at each layer of the index. It’s important to note that sortedness is determined solely from the integer codes and does not check (or care) whether the levels themselves are sorted. Fortunately, the constructors from_tuples and from_arrays ensure that this is true, but if you compute the levels and codes yourself, please be careful.

Values

Pandas extends NumPy’s type system with custom types, like Categorical or datetimes with a timezone, so we have multiple notions of “values”. For 1-D containers (Index classes and Series) we have the following convention:

So, for example, Series[category]._values is a Categorical, whileSeries[category]._ndarray_values is the underlying codes.