Ensure that DataFrame constructed from dict follows insertion order by alexander-ponomaroff · Pull Request #25915 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation19 Commits2 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
- closes Inner dictionaries are not displaying in insertion order #25911
- tests added / passed
- passes
git diff upstream/master -u -- "*.py" | flake8 --diff
- whatsnew entry
Hello @alexander-ponomaroff! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻
Comment last updated at 2019-03-29 00:59:46 UTC
@@ -314,6 +314,17 @@ def test_constructor_dict(self): |
---|
with pytest.raises(ValueError, match=msg): |
DataFrame({'a': 0.7}, columns=['a']) |
def test_constructor_dict_order(self): |
data = {'B': {'b': 1, 'c': 2, 'a': 3}, |
'A': {'b': 3, 'c': 2, 'a': 1}} |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the order isn't the same across all dicts?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would keep the order of the first dict. For example if the first dict is ordered b, c, a like in the above example and the second dict is ordered c, b, a - the resulting DataFrame would maintain the order b, c, a.
There are problems in a few tests now because the ordering has changed from sorted to maintaining the order of insertion. If this is a needed fix, I will go into those tests and ensure that they follow the new insertion order. Please let me know.
I'm -0, maybe a slight -1 on this change.
It would keep the order of the first dict. For example if the first dict is ordered b, c, a like in the above example and the second dict is ordered c, b, a - the resulting DataFrame would maintain the order b, c, a.
This seems rather arbitrary and not sure why we want to impose or guarantee that. Outside of small hand-coded samples I don't see why this matters
This has to do with dicts maintaining the order of insertion now. However, I would agree that in this case keeping the data sorted would make more sense for consistency. @jreback What do you think about this issue and PR?
The middle ground may be to keep the insertion order if the dicts are the same (for example b, c, a throughout). However if they are different (for example b, c, a at first - c, b, a after) the order would be sorted. But this might not make much sense either, just putting it out there.
Sure plenty of options but I still don't see what value that would really add. Pandas offers a lot of flexibility to explicitly sort or select by labels so I can't imagine this is a problem for users outside of toy use cases.
Understood dicts are ordered in Python but I'm not aware of guarantees in similar tools where that order is meaningful across an array of dicts or similar entities (thinking JSON in particular)
I don't buy this change. dicts don't guarantee insertion order, only OrderedDicts. We have not adopted this for DataFrame construction at all, though we could. But this is a pretty jarring change and breaks long-standing apis. What is the justification?
Closing as I don't think this is something we want to guarantee. Will note in issue as well
I don't buy this change. dicts don't guarantee insertion order, only OrderedDicts.
Yes they 100% do. From a python-dev message by GvR: "Dict keeps insertion order"
@boardtc not for 3.5 and 3.6 is not guaranteed (though we do rely on this)
@jreback but 3.5 is over a year old, why is pandas stuck to that?
@boardtc we just dropped 2.7
3.5 will be around for a while
there is this backward compatibility thing :)
there is this backward compatibility thing :)
But for constructing a DataFrame from a dict ({col_name -> col_values}) we changed the behaviour anyway to use the dict order, and didn't care about backwards compatibility.
there is this backward compatibility thing :)
Just so! Was conditional execution considered?
@boardtc What do you mean exactly with "conditional execution" ? Different behaviour for python 3.5 vs python >= 3.6 ? But that would still be backwards incompatible for the newer versions of python, as we now sort.
@jorisvandenbossche Yes. I'm not following why there could not be one code path for 2.8-3.5 and another for 3.6+
There would indeed be separate code paths (sorting for <= 3.5, preserving order for 3.6 and up). But what I am trying to say is that this does not solve the backwards compatibility issue (as the current behaviour on python 3.7 would still change).
But that said, I am not sure if the back compat is important enough here to not do it. I think it are more the questions about "what if the order between dicts differ?" making this a less clear decision.
ghost mentioned this pull request
4 tasks