Asymmetry in corner case for DataFrame __getattr__ and __setattr__ · Issue #8994 · pandas-dev/pandas (original) (raw)

A student of mine ran into a confusing problem which ended up being due to an asymmetry in DataFrame.__setattr__ and DataFrame.__getattr__ when an attribute and a column have the same name. Here is a short example session:

import pandas as pd print(pd.version) # '0.14.1' data = pd.DataFrame({'x':[1, 2, 3]})

try to create a new column, making a common mistake

data.y = 2 * data.x

oops! That didn't create a column.

we need to do it this way instead

data['y'] = 2 * data.x

update the attribute, and it updates the column, not the attribute

data.y = 0 print(data['y']) # [0, 0, 0]

print the attribute, and it prints the attribute, not the column

print(data.y) # [2, 4, 6] print(data['y']) # [0, 0, 0]

The confusion derived from the fact that in this situation, data.__getattr__('y') refers to the attribute, while data.__setattr__('y', val) refers to the column.

I understand that using attributes to access columns is not recommended, but the asymmetry in this corner case led to a lot of confusion! It would be better if __getattr__ and __setattr__ would always refer to the same object.