Remove codepath asymmetry in dataframe count() by qwhelan · Pull Request #9136 · pandas-dev/pandas (original) (raw)
the isnull determination is prob ok, but the vbenches are all very misleading. try this. In fact should add a mixed-type vbench.
In [1]: data = np.random.randn(10000, 1000)
In [2]: df = DataFrame(data)
In [3]: df.ix[50:1000,20:50] = np.nan
In [4]: df.ix[2000:3000] = np.nan
In [5]: df.ix[:,60:70] = np.nan
In [6]: df2 = df.copy()
In [7]: df2['foo'] = 'bar'
In [8]: %timeit df.dropna(how="any",axis=1)
10 loops, best of 3: 118 ms per loop
In [9]: %timeit df2.dropna(how="any",axis=1)
10 loops, best of 3: 159 ms per loop
In [10]: %timeit df2.dropna(how="any",axis=0)
1 loops, best of 3: 1.69 s per loop
In [11]: %timeit df2.dropna(how="any",axis=1)
10 loops, best of 3: 159 ms per loop
In [12]: %timeit DataFrame(df2.values).dropna(how="any",axis=1)
1 loops, best of 3: 671 ms per loop
In [13]: %timeit DataFrame(df2.values).dropna(how="any",axis=0)
1 loops, best of 3: 1.59 s per loop