COMPAT: Pypy tweaks by mattip · Pull Request #17351 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation20 Commits5 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

a set of tweaks that I discovered when getting PyPY to pass tests, some could cause issues on CPython as well:

496cc3a clears up an assumption that sets are sorted
d161b08 makes sure PyList_GET_SIZE is used only on PyListObject, if used on an ndarray it will return the value of the data pointer, which is not the intention at all
2744c5c separates the cases where float('nan') is not np.nan and like cases by implementation, on PyPy the is comparison always checks the value of struct.pack('d', val) for equality.
5425137 cleans up the use of rank and mean as keys (the warnings are ignored?) and adds a trivial assert where a test was simply making a call

I left these as seperate commits for ease of discussion and possible rejection, if desired I can squash them to a single commit

gfyoung added the Compat

pandas objects compatability with Numpy or Python functions

label

Aug 27, 2017

@@ -2031,12 +2031,12 @@ def test_mutate_groups(self):

def f_copy(x):
x = x.copy()
x['rank'] = x.val.rank(method='min')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These warnings will get fixed by #17298

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, I will remove 5425137 from the pull request, it wasn't really complete

Just for my understanding on 2744c5c , you're saying this would be true on PyPy?

a = 1.2 b = 1.2 a is b False

@@ -454,7 +454,8 @@ cdef class TextReader:
if callable(usecols):
self.usecols = usecols
else:
self.usecols = set(usecols)
self.usecols = list(set(usecols))
self.usecols.sort()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an existing test that highlights where this matters?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, this commit is bogus. I'm not sure why but somehow this changes the dtype in one of the subtests in TestCParserHighMemory.test_usecols_with_parse_dates_and_usecol_names on PyPy, col 'a' gets the correct datetime64[ns] when usercols is [0, 2, 3] but object dtype when usercols is [3, 2, 0]. Something else is going on, I will back this out and look for the real culprit.

@chris-b1 about is-ness, yes that is True

>>>> a = float('nan')
>>>> b = float('nan')
>>>> a is b
True
>>>> a=1.2
>>>> b=1.2
>>>> a is b
True

on cpython both are False

@@ -1714,6 +1714,7 @@ def _set_noconvert_columns(self):
# A set of integers will be converted to a list in
# the correct order every single time.
usecols = list(self.usecols)
usecols.sort()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.usecols is a set of the original usecols. On PyPy sets are not sorted so "every single time" requires a sort.

>>>> list(set([3, 0, 2]))
[3, 0, 2]

Running tests with this line replaces by ``usecols.reverse() will show the dependency on the list being sorted:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no averse to this, but why does this matter?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test that exercises the sortedness?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are tests, but they all pass on CPython since sets of ints are always sorted, ie pandas/tests/io/parsers/usecol.py, lines 199, 270, 296 which all failed on PyPy before this change. Not sure how I could construct a failing test for CPython

@@ -332,6 +332,9 @@ Performance Improvements
Bug Fixes
~~~~~~~~~

- Compatibility with PyPy increased by removing the assumption that sets are sorted in ``usecols``

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don;t you make a separate bug fix sub-section for pypy fixes (IIRC a couple of more in here)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you also don't need to be as detailed, give these from a user perspective, rather just refer to the issue (or PR in this case)

@@ -1714,6 +1714,7 @@ def _set_noconvert_columns(self):
# A set of integers will be converted to a list in
# the correct order every single time.
usecols = list(self.usecols)
usecols.sort()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no averse to this, but why does this matter?

@@ -1714,6 +1714,7 @@ def _set_noconvert_columns(self):
# A set of integers will be converted to a list in
# the correct order every single time.
usecols = list(self.usecols)
usecols.sort()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test that exercises the sortedness?

Hello @mattip! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on September 06, 2017 at 18:50 Hours UTC

@@ -3,8 +3,10 @@
import os
import pandas.util.testing as tm

from pandas import read_csv, read_table
from pandas import read_csv, read_table, DataFrame, Index

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



class TestUnsortedUsecols(object):
def test_override__set_noconvert_columns(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah you are basically mocking the parser here....ok. I didn't mean go that far, but this is ok.

thanks @mattip always happy to have patches from pypy!

I don't think this had any issues to close?

jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request

Sep 10, 2017

alanbato pushed a commit to alanbato/pandas that referenced this pull request

Nov 10, 2017

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request

Nov 28, 2017

Labels

Compat

pandas objects compatability with Numpy or Python functions