Issue 25928: Add Decimal.as_integer_ratio() - Python tracker (original) (raw)

Issue25928

process

Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: skrah Nosy List: belopolsky, gvanrossum, johnwalker, mark.dickinson, python-dev, rhettinger, serhiy.storchaka, skrah, steven.daprano, terry.reedy, tim.peters
Priority: normal Keywords: patch

Created on 2015-12-22 22:18 by johnwalker, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue25928.diff skrah,2015-12-28 14:12 review
Messages (27)
msg256873 - (view) Author: John Walker (johnwalker) * Date: 2015-12-22 22:18
In statistics, there is a FIXME on Line 250 above _decimal_to_ratio that says: # FIXME This is faster than Fraction.from_decimal, but still too slow. Half of the time is spent in a conversion in d.as_tuple(). Decimal internally stores the digits as a string, but in d.as_tuple(), the digits are individually cast to integers and returned as a tuple of integers. This is OK, but _decimal_to_ratio undoes the work that was done in d.as_tuple() by adding them all back into an integer. A similar, but slightly different approach is taken in Fractions.from_decimal, where the tuple is cast into a string and then parsed into an integer. We can be a lot faster if we use the _int instance variable directly. In the case of _decimal_to_ratio, the new code seems to be twice as fast with usage _decimal_to_ratio(Decimal(str(random.random()))): def _decimal_to_ratio(d): sign, exp = d._sign, d._exp if exp in ('F', 'n', 'N'): # INF, NAN, sNAN assert not d.is_finite() return (d, None) num = int(d._int) if exp < 0: den = 10**-exp else: num *= 10**exp den = 1 if sign: num = -num return (num, den) If the performance improvement is considered worthwhile, here are a few solutions I see. 1) Use _int directly in fractions and statistics. 2) Add a digits_as_str method to Decimal. This prevents exposing _int as an implementation detail, and makes sense to me since I suspect there is a lot of code casting the tuple of int to a string anyway. 3) Add a parameter to as_tuple for determining whether digits should be returned as a string or a tuple. 4) Deprecate _int in Decimal and add a new reference str_digits. There are probably more solutions. I lean towards 4, because it makes usage easier and avoids cluttering Decimal with methods. Here is what I used for benchmarks: ======== import timeit old_setup = """ import random from decimal import Decimal def _decimal_to_ratio(d): sign, digits, exp = d.as_tuple() if exp in ('F', 'n', 'N'): # INF, NAN, sNAN assert not d.is_finite() return (d, None) num = 0 for digit in digits: num = num*10 + digit if exp < 0: den = 10**-exp else: num *= 10**exp den = 1 if sign: num = -num return (num, den) def run_it(): dec = Decimal(str(random.random())) _decimal_to_ratio(dec) """ new_setup = """ import random from decimal import Decimal def _decimal_to_ratio(d): sign, exp = d._sign, d._exp if exp in ('F', 'n', 'N'): # INF, NAN, sNAN assert not d.is_finite() return (d, None) num = int(d._int) if exp < 0: den = 10**-exp else: num *= 10**exp den = 1 if sign: num = -num return (num, den) def run_it(): dec = Decimal(str(random.random())) _decimal_to_ratio(dec) """ if __name__ == '__main__': print("Testing proposed implementation") print("number = 10000") print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=10000)) print("number = 100000") print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=100000)) print("number = 1000000") print(timeit.Timer(stmt='run_it()', setup=new_setup).timeit(number=1000000)) print("Testing old implementation") print("number = 10000") print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=10000)) print("number = 100000") print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=100000)) print("number = 1000000") print(timeit.Timer(stmt='run_it()', setup=old_setup).timeit(number=1000000))
msg256874 - (view) Author: John Walker (johnwalker) * Date: 2015-12-22 22:21
Heres the output of running the benchmark on my machine: Testing proposed implementation number = 10000 0.07098613299967838 number = 100000 0.6952260910002224 number = 1000000 6.948197601999709 Testing current implementation number = 10000 0.1418162760000996 number = 100000 1.350394603001405 number = 1000000 13.625065807000283
msg256914 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-23 15:07
I guess there's some version mixup here: From Python 3.3 on the integrated C version of decimal does not store the digits as a string and does not have the private _int method.
msg256917 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-23 15:18
May be implement the as_integer_ratio() method and/or numerator and denominator attributes in the Decimal class?
msg256932 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2015-12-23 20:57
On Wed, Dec 23, 2015 at 03🔞28PM +0000, Serhiy Storchaka wrote: > May be implement the as_integer_ratio() method and/or numerator and > denominator attributes in the Decimal class? That would also be good as it will decrease the API differences between floats and Decimals and make it easier to duck-type one for the other.
msg256933 - (view) Author: John Walker (johnwalker) * Date: 2015-12-23 21:01
> I guess there's some version mixup here: From Python 3.3 on > the integrated C version of decimal does not store the digits > as a string and does not have the private _int method. Stefan, _int is a slot in Lib/_pydecimal.py. It should be defined on python 3.5 and tip, unsure about other versions. Python 3.5.1 (default, Dec 7 2015, 12:58:09) [GCC 5.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from decimal import Decimal >>> Decimal("100.00") Decimal('100.00') >>> Decimal("100.00")._int '10000'
msg256934 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-23 21:30
On Wed, Dec 23, 2015 at 09:01:22PM +0000, John Walker wrote: > Stefan, _int is a slot in Lib/_pydecimal.py. It should be defined on python 3.5 and tip, unsure about other versions. > > Python 3.5.1 (default, Dec 7 2015, 12:58:09) > [GCC 5.2.0] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> from decimal import Decimal > >>> Decimal("100.00") > Decimal('100.00') > >>> Decimal("100.00")._int > '10000' That should only happen if the C version did not build for some reason: Python 3.6.0a0 (default:323c10701e5d, Dec 14 2015, 14:28:41) [GCC 4.8.4] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from decimal import Decimal >>> Decimal("100.00")._int Traceback (most recent call last): File "", line 1, in AttributeError: 'decimal.Decimal' object has no attribute '_int' >>>
msg256936 - (view) Author: John Walker (johnwalker) * Date: 2015-12-23 21:46
> That should only happen if the C version did not build for some reason: Ahh, gotcha. I assume one instance where this happens is when the machine doesn't have libmpdec.so
msg256937 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-23 21:50
No, the regular build uses the libmpdec that is shipped with Python. The external libmpdec.so only comes into play if you compile --with-system-libmpdec.
msg256938 - (view) Author: John Walker (johnwalker) * Date: 2015-12-23 22:05
> No, the regular build uses the libmpdec that is shipped with > Python. The external libmpdec.so only comes into play if you > compile --with-system-libmpdec. Oh, OK. I see whats happening. My distro deletes the shipped version and compiles --with-system-libmpdec. We're on the same page now, thanks. https://projects.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/python
msg256940 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-23 22:25
Let's re-target this issue: Implementing as_integer_ratio() sounds reasonable, but let's hope people won't try to convert Decimal('1E+999999999999999999'). Mark, was there perhaps a reason not to add the method?
msg257003 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-12-25 17:58
A visible new feature is an enhancement (even if performance is the reason for the new feature) and can only go in 3.6.
msg257073 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-27 12:13
Previously the method was rejected in #8947. But the speedup is quite dramatic. This is a benchmark for converting "1.91918261298362195e-100", 1000000 repetitions: float.as_integer_ratio: 0.548023 Decimal.as_integer_ratio: normalize=False: 2.661191 Decimal.as_integer_ratio: normalize=True: 4.382903 Fraction.from_decimal: 29.436584
msg257088 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2015-12-27 19:12
> Mark, was there perhaps a reason not to add the method? In the past, there's been a desire to keep the decimal module minimal in scope, implementing little more than what's required for the specification. A number of proposed additions have been rejected on this basis. Ah, and now I read Stefan's reference to #8947. I'm not sure anything has really changed since that discussion. Count me as -0E0 on the proposal. Off-topic: I wonder whether `int.as_integer_ratio` should be implemented (for the same reasons that `int.numerator`, `int.real` and so on are).
msg257090 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-27 19:53
Keeping the API small is a good reason against it. There aren't many ways to destructure a Decimal though: As far as I know, the return value of as_tuple() dates back from the time when the coefficient was actually a tuple. The function is slow and not really nice to use. Probably many people would be happy if we added as_integer_ratio() and an as_triple() function that represents the coefficient as an integer. > Off-topic: I wonder whether `int.as_integer_ratio` should be implemented (for the same reasons that `int.numerator`, `int.real` and so on are). If users want to duck-type, I think yes.
msg257094 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-27 20:10
Arguments against as_integer_ratio() look weighty. But may be there are less arguments against numerator and denominator?
msg257097 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-12-27 20:30
Now that there is more than one use case for Decimal.as_integer_ratio(), I'll add my support to this feature request.
msg257115 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-28 14:12
Here's a patch. The Python and doc parts are from Mark (#8947). I did not optimize the normalization yet, in C the code is less clean and there were some corner cases where the gcd is actually faster.
msg257137 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-12-28 22:14
New changeset f3b09c269af0 by Stefan Krah in branch 'default': Issue #25928: Add Decimal.as_integer_ratio(). Python parts and docs by https://hg.python.org/cpython/rev/f3b09c269af0
msg257141 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-28 22:37
Hopefully I wasn't moving too fast, but I won't have time in the next days/weeks. Serhiy and Alexander (#8947) would prefer more homogeneous error messages and docstrings between float/pydecimal/cdecimal. I wouldn't mind a patch in another issue (no argument clinic!), provided that we agree on something and have input from the native English speakers. Thanks for the review!
msg257147 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-28 22:52
> Hopefully I wasn't moving too fast, but I won't have time in the next days/weeks. No, the patch is pretty clear. Thanks. > Serhiy and Alexander (#8947) would prefer more homogeneous error messages and docstrings between float/pydecimal/cdecimal. This would help to optimize creating a Fraction. Opened for this.
msg257149 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-12-28 23:12
New changeset 510ff609cb4f by Stefan Krah in branch 'default': Issue #25928: Temporarily disable some tests in test_statistics in order https://hg.python.org/cpython/rev/510ff609cb4f
msg257150 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-28 23:15
Steven, could you have a look at the failures in test_statistics? Some tests fail because they assume non-normalized fractions, I still have to look at the other assumptions.
msg257151 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-28 23:47
Ah yes, the test_statistics failures look like #18841 again.
msg257179 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2015-12-29 12:51
I've opened #25974 for the statistics.py issues.
msg262887 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-04-05 02:39
I don't think a new public method should have been added. Historically, we've been careful to not grow the API beyond what is in the spec or the dunder methods required to interface with standard Python. The feature creep is at odds with the intended goals for the module that have been present since the outset. As long as the spec remains unchanged, the API for this module should be treated as stable. Another issue is that the API for the module is already so large that it impairs usability. Please don't make it worse by adding new methods and inventing details that aren't in the spec.
msg262896 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2016-04-05 10:10
Raymond, you added your support in . I'm not very happy to spend my time implementing the feature and then rehashing everything after 3 months.
History
Date User Action Args
2022-04-11 14:58:25 admin set github: 70116
2018-03-13 19:04:17 belopolsky link issue8947 superseder
2016-04-05 10:10:41 skrah set nosy: + gvanrossummessages: +
2016-04-05 02:39:13 rhettinger set nosy: + tim.petersmessages: +
2015-12-29 12:51:14 skrah set status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2015-12-28 23:47:09 skrah set messages: +
2015-12-28 23:15:21 skrah set messages: +
2015-12-28 23:12:04 python-dev set messages: +
2015-12-28 22:52:50 serhiy.storchaka set messages: +
2015-12-28 22:37:29 skrah set messages: +
2015-12-28 22:14:15 python-dev set nosy: + python-devmessages: +
2015-12-28 16:47:16 serhiy.storchaka set stage: test needed -> patch review
2015-12-28 14:12:26 skrah set files: + issue25928.diffkeywords: + patchmessages: +
2015-12-27 20:30:25 rhettinger set messages: +
2015-12-27 20:10:57 serhiy.storchaka set messages: +
2015-12-27 19:53:37 skrah set messages: +
2015-12-27 19:12:46 mark.dickinson set messages: +
2015-12-27 12:13:24 skrah set nosy: + rhettinger, belopolskymessages: +
2015-12-25 17:58:40 terry.reedy set versions: + Python 3.6, - Python 3.5nosy: + terry.reedymessages: + type: performance -> enhancementstage: test needed
2015-12-23 22:25:16 skrah set nosy: + mark.dickinsontitle: Improve performance of statistics._decimal_to_ratio and fractions.from_decimal -> Add Decimal.as_integer_ratio()messages: + assignee: skrah
2015-12-23 22:05:14 johnwalker set messages: +
2015-12-23 21:50:32 skrah set messages: +
2015-12-23 21:46:39 johnwalker set messages: +
2015-12-23 21:30:05 skrah set messages: +
2015-12-23 21:01:22 johnwalker set messages: + versions: + Python 3.5
2015-12-23 20:57:13 steven.daprano set messages: +
2015-12-23 15🔞28 serhiy.storchaka set nosy: + serhiy.storchakamessages: +
2015-12-23 15:08:50 skrah set versions: - Python 3.5
2015-12-23 15:07:35 skrah set nosy: + steven.daprano
2015-12-23 15:07:15 skrah set nosy: + skrahmessages: +
2015-12-22 22:49:34 johnwalker set type: performance
2015-12-22 22:21:02 johnwalker set messages: +
2015-12-22 22🔞25 johnwalker create