Performance issue with DataFrames with numpy fortran-order arrays and pd.concat · Issue #11958 · pandas-dev/pandas (original) (raw)

Hi,

When trying to concat() multiple big fortran order arrays, there is a big performance hit, as most of the work goes into calling ravel().

See:
https://github.com/pydata/pandas/blob/master/pandas/core/internals.py#L4772

You can see the is_null(self) is using just a few values from the data after calling .ravel()

An easy fix is to change that line to values_flat = values.ravel(order='K')

Here is a link to numpy.ravel docs: http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.ravel.html

‘K’ means to read the elements in the order they occur in memory, except for reversing the data when strides are negative. By default, ‘C’ index order is used.