Asymmetry in corner case for DataFrame __getattr__
and __setattr__
· Issue #8994 · pandas-dev/pandas (original) (raw)
A student of mine ran into a confusing problem which ended up being due to an asymmetry in DataFrame.__setattr__
and DataFrame.__getattr__
when an attribute and a column have the same name. Here is a short example session:
import pandas as pd print(pd.version) # '0.14.1' data = pd.DataFrame({'x':[1, 2, 3]})
try to create a new column, making a common mistake
data.y = 2 * data.x
oops! That didn't create a column.
we need to do it this way instead
data['y'] = 2 * data.x
update the attribute, and it updates the column, not the attribute
data.y = 0 print(data['y']) # [0, 0, 0]
print the attribute, and it prints the attribute, not the column
print(data.y) # [2, 4, 6] print(data['y']) # [0, 0, 0]
The confusion derived from the fact that in this situation, data.__getattr__('y')
refers to the attribute, while data.__setattr__('y', val)
refers to the column.
I understand that using attributes to access columns is not recommended, but the asymmetry in this corner case led to a lot of confusion! It would be better if __getattr__
and __setattr__
would always refer to the same object.