Issue 44154: Optimize Fraction pickling (original) (raw)

Created on 2021-05-17 03:58 by Sergey.Kirpichev, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (8)

msg393781 - (view)

Author: Sergey B Kirpichev (Sergey.Kirpichev) *

Date: 2021-05-17 03:58

The current version of the Fraction.reduce() method uses str(), which produces bigger dumps, esp. for large components.

C.f.:

import random, pickle from fractions import Fraction as F random.seed(1); a = F(*random.random().as_integer_ratio()) for proto in range(pickle.HIGHEST_PROTOCOL + 1): ... print(len(pickle.dumps(a, proto))) ... 71 70 71 71 77 77 b = a**13 for proto in range(pickle.HIGHEST_PROTOCOL + 1): ... print(len(pickle.dumps(b, proto))) ... 444 443 444 444 453 453

vs the attached patch:

for proto in range(pickle.HIGHEST_PROTOCOL + 1): ... print(len(pickle.dumps(a, proto))) ... 71 68 49 49 59 59 for proto in range(pickle.HIGHEST_PROTOCOL + 1): ... print(len(pickle.dumps(b, proto))) ... 444 441 204 204 214 214

Testing for non-default protocols was also added. Let me know if all this does make sense as a PR.

msg393782 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2021-05-17 04:20

Yes, this looks reasonable. Go ahead with a PR.

msg393783 - (view)

Author: Tim Peters (tim.peters) * (Python committer)

Date: 2021-05-17 04:38

Oh yes - please do. It's not just pickle size - going through str() makes (un)pickling quadratic time in both directions if components are large. Pickle the component ints instead, and the more recent pickle protocol(s) can do both directions in linear time instead.

msg393784 - (view)

Author: Sergey B Kirpichev (Sergey.Kirpichev) *

Date: 2021-05-17 04:56

Oh yes - please do.

Ok, I did.

It's not just pickle size - going through str() makes (un)pickling quadratic time in both directions if components are large.

Yeah, I noticed speedup too, but size was much more important for may application.

BTW, the same issue affects some other stdlib modules, ex. in the Decimal() it will be more efficient to use the tuple (sign, digit_tuple, exponent) instead of dumping strings. Maybe more, simple fgrep suggests me also the ipaddress module, but I think here it's ok;-)

msg393803 - (view)

Author: Sergey B Kirpichev (Sergey.Kirpichev) *

Date: 2021-05-17 09:53

Not sure why this wasn't closed after pr merging. If this was intentional - let me know and reopen.

I'm less sure if something like this will work for a Decimal(). Perhaps, if the constructor will accept an integer as the value[1], not just a tuple of digits.

msg393988 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2021-05-20 00:03

You're right that this won't work for decimal because it takes a string constructor. A fancier reduce might do the trick but it would involve modifying the C code (no fun) as well as the Python code. Also, the conversion from decimal to string and back isn't quadratic, so we don't have the same worries. Lastly, really large fractions happen naturally as they interoperate, but oversized decimals are uncommon.

msg394177 - (view)

Author: Sergey B Kirpichev (Sergey.Kirpichev) *

Date: 2021-05-22 04:43

On Thu, May 20, 2021 at 12:03:38AM +0000, Raymond Hettinger wrote:

Raymond Hettinger <raymond.hettinger@gmail.com> added the comment: You're right that this won't work for decimal because it takes a string constructor. A fancier reduce might do the trick but it would involve modifying the C code (no fun) as well as the Python code.

Yes, it will be harder. But I think - is possible.

E.g. with this trivial patch: $ git diff diff --git a/Lib/_pydecimal.py b/Lib/_pydecimal.py index ff23322ed5..473fb86770 100644 --- a/Lib/_pydecimal.py +++ b/Lib/_pydecimal.py @@ -627,6 +627,9 @@ def new(cls, value="0", context=None): self._exp = value[2] self._is_special = True else:

```
           value = list(value)
```

           if isinstance(value[1], int):

               value[1] = tuple(map(int, str(value[1])))
           # process and validate the digits in value[1]
           digits = []
           for digit in value[1]:

@@ -3731,7 +3734,7 @@ def shift(self, other, context=None):

 # Support for pickling, copy, and deepcopy
 def __reduce__(self):

   return (self.__class__, (str(self),))

   return (self.__class__, ((self._sign, int(self._int), self._exp),))

def copy(self): if type(self) is Decimal:

Simple test suggests that 2x size difference is possible:

import pickle from test.support.import_helper import import_fresh_module P = import_fresh_module('decimal', blocked=['_decimal']) P.getcontext().prec = 1000 d = P.Decimal('101').exp() len(pickle.dumps(d)) 1045

len(pickle.dumps(d)) 468

with the above diff. (Some size reduction will be even if we don't convert back and forth the self._int, due to self._exp size. This is a less interesting case, but it's for free! No speed penalty.)

Also, the conversion from decimal to string and back isn't quadratic, so we don't have the same worries.

Yes, for a speed bonus - we need to do something more clever)

Lastly, really large fractions happen naturally as they interoperate, but oversized decimals are uncommon.

For financial calculations this, probably, is true. But perfectly legal usage of this module - to compute mathematical functions with arbitrary-precision (like mpmath does with mpmath.mpf).

Let me know if it's worth openning an issue with above improvement.

msg394231 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2021-05-24 01:36

Let me know if it's worth openning an issue with above improvement

I don't think so.

History

Date

User

Action

Args

2022-04-11 14:59:45

admin

set

github: 88320

2021-05-24 01:36:28

rhettinger

set

messages: +

2021-05-22 04:43:30

Sergey.Kirpichev

set

messages: +

2021-05-20 00:03:38

rhettinger

set

messages: +

2021-05-17 09:53:44

Sergey.Kirpichev

set

status: open -> closed
resolution: fixed
messages: +

stage: patch review -> resolved

2021-05-17 04:56:31

Sergey.Kirpichev

set

messages: +

2021-05-17 04:38:31

tim.peters

set

nosy: + tim.peters
messages: +

2021-05-17 04:26:24

Sergey.Kirpichev

set

stage: patch review
pull_requests: + <pull%5Frequest24803>

2021-05-17 04:20:24

rhettinger

set

nosy: + rhettinger
messages: +

assignee: rhettinger
type: performance

2021-05-17 03:58:34

Sergey.Kirpichev

create