BUG: EWM silently failed float32 by debnathshoham · Pull Request #42650 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation25 Commits8 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

closes BUG: pandas EWM fails silently if data types are float32 instead of float64 #42452
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry



@pytest.mark.parametrize("func", ["mean", "std", "var"])
@pytest.mark.parametrize("dtype", [np.float32, np.float16, np.float64, float, "float"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the float_dtype pytest fixture instead?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a float_dtype fixture. Are you suggesting to add that?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Added.

@pytest.mark.parametrize("dtype", [np.float32, np.float16, np.float64, float, "float"])
def test_float_dtype_ewma(dtype, func):
# GH#42452
df = DataFrame(np.random.rand(20, 3), dtype=dtype)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use constant data, build a expected = DataFrame(...) and use tm.assert_frame_equal(result, expected)?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs a whatsnew entry for 1.4

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we certainly don't support float16; this is not actually testing the result is correct

Hi @jreback . Sorry I don't follow.

we certainly don't support float16

Are you saying I should remove float16 here?
converted_dtypes.extend([np.float64, np.float32, np.float16])

this is not actually testing the result is correct

I have changed the earlier test (where I was just comparing the shape), to include constant data. I am comparing the result now(after @mroeschke suggested).

elif dtype == "float":
# GH#42452 : np.dtype("float") coerces to np.float64 from Numpy 1.20
converted_dtypes.extend(
[np.float64, np.float32, np.float16] # type: ignore[list-item]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here @jreback means we don't explicitly support np.float64. We would just like to capture np.float32/64

@pytest.mark.parametrize("func", ["mean", "std", "var"])
def test_float_dtype_ewma(func, float_dtype):
# GH#42452
expected_mean = DataFrame(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you parameterize the expected results too like:

@pytest.mark.parametrize(
    "func, expected_data", 
    [["mean", [...]],
     ["std", [...],
     ...])

Testing on 1 column of data is fine as well e.g. range(5)

@@ -4279,6 +4279,11 @@ def check_int_infer_dtype(dtypes):
# error: Argument 1 to "append" of "list" has incompatible type
# "Type[signedinteger[Any]]"; expected "Type[signedinteger[Any]]"
converted_dtypes.append(np.int64) # type: ignore[arg-type]
elif dtype == "float":

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think this is actually doing? the else clase below should handle no?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really far down the chain from the caller. what exactly is calling this? this has all kinds of implications changing here (which is really puzzling why nothing else breaks)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being called below in obj.select_dtypes

def _create_data(self, obj: FrameOrSeries) -> FrameOrSeries:
"""
Split data into blocks & return conformed data.
"""
# filter out the on from the object
if self.on is not None and not isinstance(self.on, Index) and obj.ndim == 2:
obj = obj.reindex(columns=obj.columns.difference([self.on]), copy=False)
if self.axis == 1:
# GH: 20649 in case of mixed dtype and axis=1 we have to convert everything
# to float to calculate the complete row at once. We exclude all non-numeric
# dtypes.
obj = obj.select_dtypes(include=["integer", "float"], exclude=["timedelta"])
obj = obj.astype("float64", copy=False)
obj._mgr = obj._mgr.consolidate()
return obj

Yes, ideally the else should have handled this. But this infer_dtype_from_object calls pandas_dtype, which ultimately calls np.dtype(dtype) as below (which defaults to np.float64)

npdtype = np.dtype(dtype)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok can you add some unit tests which exercise this particular piece of code (you will have to look for which ones)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback added another relevant test. Please let me know if this is fine.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the added tests were good. I think what was meant was tests for the select_dtypes method specifically.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke added test for select_dtypes. Please let me know if this is ok?

@@ -1424,3 +1424,11 @@ def test_rolling_zero_window():
result = s.rolling(0).min()
expected = Series([np.nan])
tm.assert_series_equal(result, expected)


def test_rolling_float_dtype(float_dtype):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this worked in 1.1.5, see #42452 (comment)

we may want to consider backporting this fix.

hi @simonjayhawkins,
I am getting the below right now (i.e. columns a with float16 is getting dropped)

      b     c     d     e     f
0   1.0   1.0   2.0   3.0   4.0
1   7.0   7.0   8.0   9.0  10.0
2  13.0  13.0  14.0  15.0  16.0
3  19.0  19.0  20.0  21.0  22.0

Adding float16 as well to the converted_dtypes list, would fix that as well.

alternatively, i can see that just replacing with "number" below also fixes both (as suggested in the other BUG)
obj = obj.select_dtypes(include=["integer", "float"], exclude=["timedelta"])

we are not supporting float16 in any way

Hi @jreback, then this patch should work fine. Pls let me know if you think of any other changes I should make.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feefladder pushed a commit to feefladder/pandas that referenced this pull request

Sep 7, 2021

BUG: EWM silently failed float32
added tests
resolved mypy error
added constant data in test
added pytest.fixture & whatsnew
parametrized expected df; removed float16
added test for float32
added tests on select_dtypes