ENH: Implement np.floating.as_integer_ratio
by eric-wieser · Pull Request #10741 · numpy/numpy (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
The original incentive here was to port tests from CPython for #9963
Fixes #9969
Contributor
mhvk left a comment
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beyond the comment about the test, I think this needs an entry in the release notes - I presume this is mostly there for compatibility with cpython float?
Updated with docstring and release notes
I assume this is inspired by the debate raging in python core:
- https://bugs.python.org/issue26680
- https://mail.python.org/pipermail/python-dev/2018-March/152358.html
- https://bugs.python.org/issue33073
The argument there is actually about adding as_integer_ratio
to int
, which might suggest doing the same for the np.integer
types. Of course our ints are less duck-compatible with floats than Python's ints, because if you try to treat say a np.int8
as if it were a float
then you're going to have a bad time with wraparound.
The whole topic seems very unimportant to me, because who even cares about as_integer_ratio
on floats or ints? It seems like parlor trick. If you want to find out the exact value of a float there are more useful ways. But OTOH I guess this isn't necessarily an argument against it either.
I'm wincing at adding more incompatibilities between scalars and 0d arrays. OTOH I don't know that eliminating those is actually realistic, and we already have this on float64
, and this is unlikely to add major new obstacles since it seems unlikely anyone will use it much.
It's also unusual and surprising that this returns Python ints rather than numpy ints, but OTOH I guess this is necessary because the returned values might not fit into an int64
.
So kinda mixed feelings about this overall. I guess my overall impression is an emphatic meh?
Am I missing important use cases?
Oh, on further examination it looks like that started out as a debate about the float.is_integer
method, which then dragged in as_integer_ratio
as it went. Sorry, this is confusing...
is_integer
is much more of an attractive nuisance than as_integer_ratio
, since people might actually think they want it, but they usually don't (because of the usual floating point rounding issues).
The argument there is actually about adding
as_integer_ratio
toint
, which might suggest doing the same for thenp.integer
types
I saw something like that elsewhere, but decided to wait to see what CPython does first. As they point out, most of the value is in consistency - and what we do right now is consistent with CPython!
If you want to find out the exact value of a float there are more useful ways.
I don't think this is true for np.longdouble
. You could use Decimal(str(f))
, but that loses precision.
I guess this is necessary because the returned values might not fit into an int64.
Yep, otherwise we may as well implement longdouble.as_integer_ratio
as float(ld).as_integer_ratio()
which drops precision, defeating the point of the function
I'm wincing at adding more incompatibilities between scalars and 0d arrays
I'm hoping that eventually we'll solve the dtype-specific-methods-on-arrays problem
If you want to find out the exact value of a float there are more useful ways.
I don't think this is true for
np.longdouble
. You could useDecimal(str(f))
, but that loses precision.
Wait, but wasn't that the whole point of the Dragon4 work, that str(f)
shouldn't lose precision? Does it not work for longdouble
? This seems like a bug to fix regardless.
I'm hoping that eventually we'll solve the dtype-specific-methods-on-arrays problem
I don't see how any solution is really possible, since if we allow arbitrary methods to be added to arrays then it means we can never add any new methods to arrays ourselves without potentially breaking backcompat :-/.
Sorry, I realized this is probably just spitting out noise, but since I had started writing, I did not want to delete it again ;).
My (probably naive) gut feeling on dtype specific methods is to allow something like an elementwise
property which could give access to ufuncs (or at least unary ufuncs) and could mimic such methods. But then I also am of the opinion that scalars are a good idea, and I seem to be in the minority there :) (not in the sense of code duplication of course, but because of things like mutability and because I actually believe that in almost all cases it would be "obvious" if the result will be scalar or array).
The main "oddity" about compatibility is in my opinion not the 0D arrays do not behave like scalars, but the part that scalars have to pretend they are 0D arrays because we convert 0D arrays to scalars for good reason almost everywhere. It seems to me that this incompatibility is the other way around though, so I do not really see much of an issue ;).
I would almost like something like:
class MyFloat(np.float64):
# EDIT: Defining casting might be really annoying :(
add_method_from_ufunc(ufunc_implementation, 'possibly_method_name')
# Or for convenience, define it all in python (slow, but....):
@as_ufunc(out_dtype=object)
def as_integer_ratio(self):
# Numpy can wrap it into a ufunc:
return integer_ratio
# EDIT: To explain:
val = MyFloating(0.1)
val.as_integer_ratio() # Would work
val_arr = np.array([val]) # Create array of dtype MyFloating, possibly need dtype=MyFloating
val_arr.elementwise.as_integer_ratio() # Could work
MyFloating.elementwise(val_arr) # Something in this direction should likely also work
Now, will people be able to block our methods? Sure, but only if they subclass our types, and then we could possibly even implement a warning that a new dtype method is being/will be shadowed.
Wait, but wasn't that the whole point of the Dragon4 work, that
str(f)
shouldn't lose precision?
No, I think the point was that ftype(str(f))
shouldn't lose precision. Dragon4 produces a unique representation for each float, but it doesn't produces an exact decimal representation.
For example:
from fractions import Fraction f16 = np.float16(2.1) str(f16) 2.1 str(np.longdouble(f16)) 2.099609375 # this is the exact value, but we got 2.1 above because that was enough for uniqueness
f_bad = Fraction(str(f16)); f_bad Fraction(21, 10) f_bad == f16 False
f_good = Fraction(*f16.as_integer_ratio()); f_good Fraction(1075, 512) f_good == f16 True
One of the incentives behind this PR was to be able to write better tests for longdouble
types - in particular, the cpython tests for math.remainder
fall back on .as_integer_ratio()
, so it would nice to be able to duplicate them to test remainder_ieee
#9963
Any more thoughts on this? I don't think that thinking about dtype-specific array methods is relevant here - we already have np.float64.as_integer_ratio
, adding np.floatXX.as_integer_ratio
seems like a non-contentious extension.
The only argument I can see against this is maintenance cost - and I think that's paid for by a better ability to debug precision issues with longdouble
.
I think we might as well have it.
@charris, any opinions?
CI failure is garbage here
Should/Did this hit the mailing list? Seems like a simple case of adapting to CPython's float type. In any case it needs updating for 1.16
This comment has been minimized.
Python is now getting int.as_integer_ratio()
too (python/cpython#8750), so it seems like the decision is that all numeric types should have this. @njsmith, is this convincing enough for you?
This comment has been minimized.
Guess I now need to update this for 1.17...
I'd be quite happy to merge this once rebased.
This matches the builtin float.as_integer_ratio
and (in recent python versions) int.as_integer_ratio
.
Rebased, let's see if tests still pass
Contributor
mhvk left a comment
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation looks all OK. For the tests, I'd prefer replacing the 1000 always-the-same random samples with a few well-chosen ones.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started refactoring the test a little locally based on Marten's feedback. Maybe I'll push a separate commit for Eric to revise.
I pushed in a draft commit that breaks the huge / nested loop test into smaller tests for clarity with respect to reporting failures and having less nesting in a single test. Not sure if Eric will like that but can always just gut the commit.
I didn't try to tackle the case generation yet though.
I think a lot of the test structure was designed to resemble the upstream tests this emulates. I'll have to check how much I modified them
This is the test I copied from:
def test_floatasratio(self):
for f, ratio in [
(0.875, (7, 8)),
(-0.875, (-7, 8)),
(0.0, (0, 1)),
(11.5, (23, 2)),
]:
self.assertEqual(f.as_integer_ratio(), ratio)
for i in range(10000):
f = random.random()
f *= 10 ** random.randint(-100, 100)
n, d = f.as_integer_ratio()
self.assertEqual(float(n).__truediv__(d), f)
R = fractions.Fraction
self.assertEqual(R(0, 1),
R(*float(0.0).as_integer_ratio()))
self.assertEqual(R(5, 2),
R(*float(2.5).as_integer_ratio()))
self.assertEqual(R(1, 2),
R(*float(0.5).as_integer_ratio()))
self.assertEqual(R(4728779608739021, 2251799813685248),
R(*float(2.1).as_integer_ratio()))
self.assertEqual(R(-4728779608739021, 2251799813685248),
R(*float(-2.1).as_integer_ratio()))
self.assertEqual(R(-2100, 1),
R(*float(-2100.0).as_integer_ratio()))
self.assertRaises(OverflowError, float('inf').as_integer_ratio)
self.assertRaises(OverflowError, float('-inf').as_integer_ratio)
self.assertRaises(ValueError, float('nan').as_integer_ratio)
Hmm, yes, I think I complained about the random number test before :-; which is probably why you have only 1000, not 10000. Still do not see any point in it.
I pushed in a draft commit that breaks the huge / nested loop test into smaller tests
If we're breaking up the test, it would make sense to me to create a class TestAsIntegerRatio
to group them together.
it would make sense to me to create a class TestAsIntegerRatio to group them together.
I've done this too now. I suspect Eric or Marten will still have some thoughts on those hard-coded test examples I've added in now though. At first I thought the integer cases didn't seem well-sampled, but I think it is just deceptive because of minexp
/ maxexp
limits, and the magnitudes on the ldexp
values produced seem to be on par with the bigger loop originally in place.
Ah, Windows failures on the hardcoded test values. np.longdouble
looks like the suspect, I think.
Well, at least we're seeing something useful from the ppc64le CI here
the case that fails: f = np.ldexp(0.4277180662199366, 14266, dtype='float128')
Thanks for the test cleanup @tylerjereddy - I should do some more reading on the hypothesis module you're using to help with these.
I amended your last commit to copy the skipif
style I found elsewhere in umath tests, and fixed some variable names that I screwed up in the original test.
Feel free to tweak them some more, but I think either way this looks good to me.
Ok, the code changes were already approved by a core dev modulo testing refinements, which are now complete. So, in it goes, thanks Eric