ENH: Implement np.floating.as_integer_ratio by eric-wieser · Pull Request #10741 · numpy/numpy (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

eric-wieser

The original incentive here was to port tests from CPython for #9963

Fixes #9969

mhvk

Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond the comment about the test, I think this needs an entry in the release notes - I presume this is mostly there for compatibility with cpython float?

@eric-wieser

Updated with docstring and release notes

@njsmith

I assume this is inspired by the debate raging in python core:

The argument there is actually about adding as_integer_ratio to int, which might suggest doing the same for the np.integer types. Of course our ints are less duck-compatible with floats than Python's ints, because if you try to treat say a np.int8 as if it were a float then you're going to have a bad time with wraparound.

The whole topic seems very unimportant to me, because who even cares about as_integer_ratio on floats or ints? It seems like parlor trick. If you want to find out the exact value of a float there are more useful ways. But OTOH I guess this isn't necessarily an argument against it either.

I'm wincing at adding more incompatibilities between scalars and 0d arrays. OTOH I don't know that eliminating those is actually realistic, and we already have this on float64, and this is unlikely to add major new obstacles since it seems unlikely anyone will use it much.

It's also unusual and surprising that this returns Python ints rather than numpy ints, but OTOH I guess this is necessary because the returned values might not fit into an int64.

So kinda mixed feelings about this overall. I guess my overall impression is an emphatic meh?

Am I missing important use cases?

@njsmith

Oh, on further examination it looks like that started out as a debate about the float.is_integer method, which then dragged in as_integer_ratio as it went. Sorry, this is confusing...

is_integer is much more of an attractive nuisance than as_integer_ratio, since people might actually think they want it, but they usually don't (because of the usual floating point rounding issues).

@eric-wieser

The argument there is actually about adding as_integer_ratio to int, which might suggest doing the same for the np.integer types

I saw something like that elsewhere, but decided to wait to see what CPython does first. As they point out, most of the value is in consistency - and what we do right now is consistent with CPython!

If you want to find out the exact value of a float there are more useful ways.

I don't think this is true for np.longdouble. You could use Decimal(str(f)), but that loses precision.

I guess this is necessary because the returned values might not fit into an int64.

Yep, otherwise we may as well implement longdouble.as_integer_ratio as float(ld).as_integer_ratio() which drops precision, defeating the point of the function

I'm wincing at adding more incompatibilities between scalars and 0d arrays

I'm hoping that eventually we'll solve the dtype-specific-methods-on-arrays problem

@njsmith

If you want to find out the exact value of a float there are more useful ways.

I don't think this is true for np.longdouble. You could use Decimal(str(f)), but that loses precision.

Wait, but wasn't that the whole point of the Dragon4 work, that str(f) shouldn't lose precision? Does it not work for longdouble? This seems like a bug to fix regardless.

I'm hoping that eventually we'll solve the dtype-specific-methods-on-arrays problem

I don't see how any solution is really possible, since if we allow arbitrary methods to be added to arrays then it means we can never add any new methods to arrays ourselves without potentially breaking backcompat :-/.

@seberg

Sorry, I realized this is probably just spitting out noise, but since I had started writing, I did not want to delete it again ;).

My (probably naive) gut feeling on dtype specific methods is to allow something like an elementwise property which could give access to ufuncs (or at least unary ufuncs) and could mimic such methods. But then I also am of the opinion that scalars are a good idea, and I seem to be in the minority there :) (not in the sense of code duplication of course, but because of things like mutability and because I actually believe that in almost all cases it would be "obvious" if the result will be scalar or array).

The main "oddity" about compatibility is in my opinion not the 0D arrays do not behave like scalars, but the part that scalars have to pretend they are 0D arrays because we convert 0D arrays to scalars for good reason almost everywhere. It seems to me that this incompatibility is the other way around though, so I do not really see much of an issue ;).

I would almost like something like:

class MyFloat(np.float64):
    # EDIT: Defining casting might be really annoying :(

    add_method_from_ufunc(ufunc_implementation, 'possibly_method_name')

    # Or for convenience, define it all in python (slow, but....):
    @as_ufunc(out_dtype=object)
    def as_integer_ratio(self):
        # Numpy can wrap it into a ufunc:
        return integer_ratio

# EDIT: To explain:

val = MyFloating(0.1)
val.as_integer_ratio()  # Would work

val_arr = np.array([val])  # Create array of dtype MyFloating, possibly need dtype=MyFloating
val_arr.elementwise.as_integer_ratio()  # Could work

MyFloating.elementwise(val_arr)  # Something in this direction should likely also work

Now, will people be able to block our methods? Sure, but only if they subclass our types, and then we could possibly even implement a warning that a new dtype method is being/will be shadowed.

@eric-wieser

Wait, but wasn't that the whole point of the Dragon4 work, that str(f) shouldn't lose precision?

No, I think the point was that ftype(str(f)) shouldn't lose precision. Dragon4 produces a unique representation for each float, but it doesn't produces an exact decimal representation.

For example:

from fractions import Fraction f16 = np.float16(2.1) str(f16) 2.1 str(np.longdouble(f16)) 2.099609375 # this is the exact value, but we got 2.1 above because that was enough for uniqueness

f_bad = Fraction(str(f16)); f_bad Fraction(21, 10) f_bad == f16 False

f_good = Fraction(*f16.as_integer_ratio()); f_good Fraction(1075, 512) f_good == f16 True

@eric-wieser

One of the incentives behind this PR was to be able to write better tests for longdouble types - in particular, the cpython tests for math.remainder fall back on .as_integer_ratio(), so it would nice to be able to duplicate them to test remainder_ieee #9963

@eric-wieser

Any more thoughts on this? I don't think that thinking about dtype-specific array methods is relevant here - we already have np.float64.as_integer_ratio, adding np.floatXX.as_integer_ratio seems like a non-contentious extension.

The only argument I can see against this is maintenance cost - and I think that's paid for by a better ability to debug precision issues with longdouble.

@mhvk

I think we might as well have it.

@eric-wieser

@charris, any opinions?

CI failure is garbage here

@mattip

Should/Did this hit the mailing list? Seems like a simple case of adapting to CPython's float type. In any case it needs updating for 1.16

@eric-wieser

This comment has been minimized.

@eric-wieser

Python is now getting int.as_integer_ratio() too (python/cpython#8750), so it seems like the decision is that all numeric types should have this. @njsmith, is this convincing enough for you?

@eric-wieser

This comment has been minimized.

@eric-wieser

Guess I now need to update this for 1.17...

@mhvk

I'd be quite happy to merge this once rebased.

@eric-wieser

This matches the builtin float.as_integer_ratio and (in recent python versions) int.as_integer_ratio.

@eric-wieser

Rebased, let's see if tests still pass

mhvk

Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks all OK. For the tests, I'd prefer replacing the 1000 always-the-same random samples with a few well-chosen ones.

tylerjereddy

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started refactoring the test a little locally based on Marten's feedback. Maybe I'll push a separate commit for Eric to revise.

@tylerjereddy

I pushed in a draft commit that breaks the huge / nested loop test into smaller tests for clarity with respect to reporting failures and having less nesting in a single test. Not sure if Eric will like that but can always just gut the commit.

I didn't try to tackle the case generation yet though.

@eric-wieser

I think a lot of the test structure was designed to resemble the upstream tests this emulates. I'll have to check how much I modified them

@eric-wieser

This is the test I copied from:

def test_floatasratio(self):
    for f, ratio in [
            (0.875, (7, 8)),
            (-0.875, (-7, 8)),
            (0.0, (0, 1)),
            (11.5, (23, 2)),
        ]:
        self.assertEqual(f.as_integer_ratio(), ratio)

    for i in range(10000):
        f = random.random()
        f *= 10 ** random.randint(-100, 100)
        n, d = f.as_integer_ratio()
        self.assertEqual(float(n).__truediv__(d), f)

    R = fractions.Fraction
    self.assertEqual(R(0, 1),
                     R(*float(0.0).as_integer_ratio()))
    self.assertEqual(R(5, 2),
                     R(*float(2.5).as_integer_ratio()))
    self.assertEqual(R(1, 2),
                     R(*float(0.5).as_integer_ratio()))
    self.assertEqual(R(4728779608739021, 2251799813685248),
                     R(*float(2.1).as_integer_ratio()))
    self.assertEqual(R(-4728779608739021, 2251799813685248),
                     R(*float(-2.1).as_integer_ratio()))
    self.assertEqual(R(-2100, 1),
                     R(*float(-2100.0).as_integer_ratio()))

    self.assertRaises(OverflowError, float('inf').as_integer_ratio)
    self.assertRaises(OverflowError, float('-inf').as_integer_ratio)
    self.assertRaises(ValueError, float('nan').as_integer_ratio)

@mhvk

Hmm, yes, I think I complained about the random number test before :-; which is probably why you have only 1000, not 10000. Still do not see any point in it.

@eric-wieser

I pushed in a draft commit that breaks the huge / nested loop test into smaller tests

If we're breaking up the test, it would make sense to me to create a class TestAsIntegerRatio to group them together.

@tylerjereddy

it would make sense to me to create a class TestAsIntegerRatio to group them together.

I've done this too now. I suspect Eric or Marten will still have some thoughts on those hard-coded test examples I've added in now though. At first I thought the integer cases didn't seem well-sampled, but I think it is just deceptive because of minexp / maxexp limits, and the magnitudes on the ldexp values produced seem to be on par with the bigger loop originally in place.

@tylerjereddy

Ah, Windows failures on the hardcoded test values. np.longdouble looks like the suspect, I think.

tylerjereddy

eric-wieser

eric-wieser

eric-wieser

eric-wieser

eric-wieser

eric-wieser

@tylerjereddy

Well, at least we're seeing something useful from the ppc64le CI here

@mattip

the case that fails: f = np.ldexp(0.4277180662199366, 14266, dtype='float128')

mattip

@tylerjereddy @eric-wieser

@eric-wieser

Thanks for the test cleanup @tylerjereddy - I should do some more reading on the hypothesis module you're using to help with these.

I amended your last commit to copy the skipif style I found elsewhere in umath tests, and fixed some variable names that I screwed up in the original test.

Feel free to tweak them some more, but I think either way this looks good to me.

@tylerjereddy

Ok, the code changes were already approved by a core dev modulo testing refinements, which are now complete. So, in it goes, thanks Eric

@eric-wieser