pandas-dev/pandas (original) (raw)
import pandas
import numpy as np
define the mask function
def mask(self,condition):
new_self = self.copy()
new_self[~condition.values] = np.nan
return new_self
pandas.DataFrame.mask = mask
index = pandas.date_range('1/1/2000',periods=8,)
columns = ['A','B','C','D']
df = pandas.DataFrame(np.random.randn(len(index),len(columns)),index=index
printdf
A B C D
2000-01-01 0.752832 0.083465-0.273210 1.128781
2000-01-02 0.895254 0.401056 1.473770 1.998924
2000-01-03 2.318820 0.384354-1.056422-1.280257
2000-01-04 0.981042 0.717762-1.015285-1.146636
2000-01-05-0.979061-1.765188 0.025436-0.815622
2000-01-06-0.166251 1.887524-0.131171-0.802795
2000-01-07 0.025936 0.122587 0.517295 0.589679
2000-01-08 0.691059 0.458683-0.856201-0.412374
pandas supports boolean indexing for Series
s = pandas.Series(np.random.randn(len(index)),index=index)
prints
prints[s<0]
2000-01-01 -0.182340
2000-01-02 0.031729
2000-01-03 0.616713
2000-01-04 -0.329961
2000-01-05 -1.220345
2000-01-06 -1.323948
2000-01-07 1.182522
2000-01-08 -0.622332
Freq:D
2000-01-01 -0.182340
2000-01-04 -0.329961
2000-01-05 -1.220345
2000-01-06 -1.323948
2000-01-08 -0.622332
but not directly in DataFrame
the mask function will enable a convenient operationdf[df < 0]
currently returns a numpy array which is correct but not that useful
print df.mask(df<0)
A B C D
2000-01-01 NaN-1.518799-0.574630 NaN
2000-01-02-1.023108 NaN NaN-0.009226
2000-01-03 NaN-0.623582 NaN-1.801656
2000-01-04-0.984583 NaN-1.082821 NaN
2000-01-05-0.709460-1.202316 NaN-0.484609
2000-01-06-0.775715 NaN-0.415970 NaN
2000-01-07-1.395435 NaN-0.293588 NaN
2000-01-08-0.377900-0.526218-0.660083 NaN