Performance issue with DataFrames with numpy fortran-order arrays and pd.concat · Issue #11958 · pandas-dev/pandas (original) (raw)
Hi,
When trying to concat() multiple big fortran order arrays, there is a big performance hit, as most of the work goes into calling ravel().
See:
https://github.com/pydata/pandas/blob/master/pandas/core/internals.py#L4772
You can see the is_null(self)
is using just a few values from the data after calling .ravel()
An easy fix is to change that line to values_flat = values.ravel(order='K')
Here is a link to numpy.ravel docs: http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.ravel.html
‘K’ means to read the elements in the order they occur in memory, except for reversing the data when strides are negative. By default, ‘C’ index order is used.