COMPAT: Pypy tweaks by mattip · Pull Request #17351 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation20 Commits5 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
a set of tweaks that I discovered when getting PyPY to pass tests, some could cause issues on CPython as well:
- 496cc3a clears up an assumption that sets are sorted
- d161b08 makes sure PyList_GET_SIZE is used only on PyListObject, if used on an ndarray it will return the value of the data pointer, which is not the intention at all
- 2744c5c separates the cases where
float('nan') is not np.nan
and like cases by implementation, on PyPy theis
comparison always checks the value of struct.pack('d', val) for equality. - 5425137 cleans up the use of
rank
andmean
as keys (the warnings are ignored?) and adds a trivial assert where a test was simply making a call
I left these as seperate commits for ease of discussion and possible rejection, if desired I can squash them to a single commit
pandas objects compatability with Numpy or Python functions
label
@@ -2031,12 +2031,12 @@ def test_mutate_groups(self): |
---|
def f_copy(x): |
x = x.copy() |
x['rank'] = x.val.rank(method='min') |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These warnings will get fixed by #17298
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, I will remove 5425137 from the pull request, it wasn't really complete
Just for my understanding on 2744c5c , you're saying this would be true on PyPy?
a = 1.2 b = 1.2 a is b False
@@ -454,7 +454,8 @@ cdef class TextReader: |
---|
if callable(usecols): |
self.usecols = usecols |
else: |
self.usecols = set(usecols) |
self.usecols = list(set(usecols)) |
self.usecols.sort() |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an existing test that highlights where this matters?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, this commit is bogus. I'm not sure why but somehow this changes the dtype in one of the subtests in TestCParserHighMemory.test_usecols_with_parse_dates_and_usecol_names
on PyPy, col 'a' gets the correct datetime64[ns]
when usercols is [0, 2, 3] but object
dtype when usercols is [3, 2, 0]. Something else is going on, I will back this out and look for the real culprit.
@chris-b1 about is-ness, yes that is True
>>>> a = float('nan')
>>>> b = float('nan')
>>>> a is b
True
>>>> a=1.2
>>>> b=1.2
>>>> a is b
True
on cpython both are False
@@ -1714,6 +1714,7 @@ def _set_noconvert_columns(self): |
---|
# A set of integers will be converted to a list in |
# the correct order every single time. |
usecols = list(self.usecols) |
usecols.sort() |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.usecols is a set
of the original usecols. On PyPy set
s are not sorted so "every single time" requires a sort.
>>>> list(set([3, 0, 2]))
[3, 0, 2]
Running tests with this line replaces by ``usecols.reverse() will show the dependency on the list being sorted:
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no averse to this, but why does this matter?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test that exercises the sortedness?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are tests, but they all pass on CPython since sets of ints are always sorted, ie pandas/tests/io/parsers/usecol.py, lines 199, 270, 296 which all failed on PyPy before this change. Not sure how I could construct a failing test for CPython
@@ -332,6 +332,9 @@ Performance Improvements |
---|
Bug Fixes |
~~~~~~~~~ |
- Compatibility with PyPy increased by removing the assumption that sets are sorted in ``usecols`` |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don;t you make a separate bug fix sub-section for pypy fixes (IIRC a couple of more in here)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you also don't need to be as detailed, give these from a user perspective, rather just refer to the issue (or PR in this case)
@@ -1714,6 +1714,7 @@ def _set_noconvert_columns(self): |
---|
# A set of integers will be converted to a list in |
# the correct order every single time. |
usecols = list(self.usecols) |
usecols.sort() |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no averse to this, but why does this matter?
@@ -1714,6 +1714,7 @@ def _set_noconvert_columns(self): |
---|
# A set of integers will be converted to a list in |
# the correct order every single time. |
usecols = list(self.usecols) |
usecols.sort() |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test that exercises the sortedness?
Hello @mattip! Thanks for updating the PR.
Cheers ! There are no PEP8 issues in this Pull Request. 🍻
Comment last updated on September 06, 2017 at 18:50 Hours UTC
@@ -3,8 +3,10 @@ |
---|
import os |
import pandas.util.testing as tm |
from pandas import read_csv, read_table |
from pandas import read_csv, read_table, DataFrame, Index |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class TestUnsortedUsecols(object): |
def test_override__set_noconvert_columns(self): |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah you are basically mocking the parser here....ok. I didn't mean go that far, but this is ok.
thanks @mattip always happy to have patches from pypy!
I don't think this had any issues to close?
jbrockmendel pushed a commit to jbrockmendel/pandas that referenced this pull request
alanbato pushed a commit to alanbato/pandas that referenced this pull request
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request
Labels
pandas objects compatability with Numpy or Python functions