BUG: Check for size != 0 before trying to insert #10193 by rekcahpassyla · Pull Request #10194 · pandas-dev/pandas (original) (raw)

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

@rekcahpassyla

@rekcahpassyla

@jreback

see my comments on the actual issue. This needs to test that it is:

@rekcahpassyla

@rekcahpassyla

@rekcahpassyla

I had another go- I looked at how DataFrame handled the setting of is_copy and then raising SettingWithCopyError or SettingWithCopyWarning, and tried to replicate that logic in Series.

Added tests for the requisite error / warning being raised, and that the original series is unchanged. Had to change a couple of other tests to change the chained_assignment option to None and back again.

I haven't squashed commits as it's still WIP. Will look again to see how I can make it more consistent with existing functionality, but here's something for now.

@rekcahpassyla

Noted the build failures. Investigating.

jreback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use TimeSeries, just Series

@rekcahpassyla

to avoid SettingWithCopyError where not desired

@rekcahpassyla

@rekcahpassyla

Requested changes from review have been made, also replaced other existing instances of pd.set_option('chained_assignment', ...) with context manager, to be consistent.

@jreback

I have to think about this again. This is a 'feature' of numpy. In that if you set something on a view of something else it works, and changes the original, as that is the point. However, in pandas, objects are pure and changing one shouldn't impact another one, so in effect, even though its supported, the view machinery is effectively frowned upon.

@shoyer @jorisvandenbossche thoughts?

@shoyer

@jreback can you clarify the specific behavior you're not sure about?

@jreback

Well its just general view semantics for Series. These almost always apply because these are a single dtyped, while for frames depends on whether its multi-dtype and what you are setting. I am just thinking that this might be confusing.

In [1]: s = Series([1,2,3])

In [2]: s
Out[2]: 
0    1
1    2
2    3
dtype: int64

In [3]: s2 = s[1:]

In [4]: s2
Out[4]: 
1    2
2    3
dtype: int64

In [5]: s2.iloc[0] = 5

In [6]: s2
Out[6]: 
1    5
2    3
dtype: int64

In [7]: s
Out[7]: 
0    1
1    5
2    3
dtype: int64

@shoyer

I agree, views are a bit confusing, but they're also a fundamental part of the NumPy ecosystem -- and they make a lot of efficient things possible.

I would definitely prefer not to add more SettingWithCopy warnings. To quote @mwaskom from Twitter:

IMO warnings are almost never a good way to communicate usage notes to users for exactly this reason. They annoy people who know what they are doing, and they rarely provide enough context for those who don't.

@rekcahpassyla

For my own edification; is the issue here that setting a value on a Series which is a view should always pass, because "in pandas, objects are pure and changing one shouldn't impact another one", or that it should always fail?

@jreback

Labels