msg46829 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2004-09-02 00:08 |
The attached patch implements a collection of performance enhancements for decimal.py. They are currently focused on the performance of the telco benchmark in the sandbox (i.e. they aim to speed up instantiation, addition, multiplication and quantization). The main strategy used (a subclass for NaN and infinite values) can be extended to encompass more methods in order to improve performance in other situations. There is one non-performance related change included - the '_sign' attribute is renamed to '_isneg' to better reflect the boolean nature of the property. |
|
|
msg46830 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2004-09-02 00:11 |
Logged In: YES user_id=1038590 I forgot to add that this is NOT yet a production quality patch. The doc strings, in particular, need work. I'm mainly posting it so Raymond can have a look at it. |
|
|
msg46831 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2004-09-02 05:18 |
Logged In: YES user_id=80475 Nice job. The timings do improve and the approach is reasonable. Please go ahead to the next step and polish it up. Also, please change _isneg back to _sign. That matches the terminology in the spec, the machine centric sign bit viewpoint of 754R, and the components of the user visible as_tuple() method. The goal is to make the patch as minimal as possible, to keep the API unchanged, and improve performance. Using the __new__ method was smart. That more closely corresponds to a potential C implementation of an immutable type. |
|
|
msg46832 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2004-09-07 05:05 |
Logged In: YES user_id=80475 Facundo, do you have time to look at this one? We need to: * revert the _isneg variable name change * find out why it gives a 2:1 gain on telco but only 10% on test_decimal.py * make sure the API has not changed in a way that will make it more difficult someday to convert this into C and make it a built-in. |
|
|
msg46833 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2004-09-07 21:06 |
Logged In: YES user_id=1038590 With the _sign/_isneg thing, I figured out the reason it was hurting my brain - I was failing to think of the value as representing a two's complement sign bit. Once I thought of it that way, it made sense with the original terminology. I suspect the lesser speedup on test_decimal.py may be due to the fact that the current version of the patch only optimises the operations used by the telco benchmark (instantiation, addition, multiplication, quantisation). As we speed up other operations, the performance on the unit tests should improve. As far as the API itself goes, the major differences I am aware of are the existence of the new subclass, and the use of __new__ instead of __init__. That will presumably be a bit more work to implement in C since another class is needed. Do we need to document somewhere that Decimal(x) may return an instance of a subclass of Decimal() rather than an instance of Decimal itself? My current plan is to produce a second version of this patch that: 1. Starts again from current CVS 2. Leaves '_sign' alone 3. Leaves existing doc strings alone (I broke a couple of them in this version of patch - I hadn't really thought about how to handle them properly) 4. For overridden methods in the _Special_Value class, duplicate the docstring from the baseclass with "f.__doc__ = Decimal.f.__doc__". This should avoid giving different answers for "help(Decimal(1).__add__)" and "help(Decimal('NaN').__add__)". 5. Optimises all operations, not just the four used by the telco benchmark (with the new strategy in place, this is mostly legwork - cutting and pasting the special-value related stuff to the new subclass, and rearranging appropriately. Unfortunately, it's legwork that I can't see a sensible way to automate, so it's a matter of taking the time to do it, and then checking that the tests all still pass). |
|
|
msg46834 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2004-09-07 21:59 |
Logged In: YES user_id=80475 Glad to hear you've wrapped your head around the sign bit. FWIW, it bugged me too. But it is so closely tied to the spec that it is better to adjust our mental model than change the code. Do extend your optimizations to the other functions. Don't just optimize the benchmark -- that defeats the purpose. Can you work together with Facundo on this one? |
|
|
msg46835 - (view) |
Author: Facundo Batista (facundobatista) *  |
Date: 2004-09-07 23:47 |
Logged In: YES user_id=752496 Nick: Submit the second version of the patch and I'll review it. Also feel free to contact me by mail anytime. |
|
|
msg46836 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2004-09-08 23:45 |
Logged In: YES user_id=1038590 2nd version of patch attached. It does NOT use a subclassing strategy for optimisation due to the following charming little problem I found with that approach: >>> import decimal >>> class foo(decimal.Decimal): pass >>> x = foo("NaN") >>> isinstance(x, foo) False I couldn't find any sane way to get around this, and it seemed to be a showstopper given the amount of work that has gone into making all the builtin types easily subclassable. The new patch instead relies mainly on the following techniques: - defer invocation of getcontext() as late as possible, so that we only call it if the context is going to be referenced - use issubclass(x, basestring) to identify if there are special cases that need handling, then use _is_nan and _is_infinity to determine exactly which special case applies (this slows the special cases down slightly, but speeds up the normal cases a lot) - make _convert_other a helper function instead of a method There are also some other minor changes that were present in the original patch (use sum() in __nonzero__, reduce the number of calls to list.reversed() in __add__, extract digits directly from integer inputs with divmod, rearrange _fixexponents to minimise context method calls, cache results of method calls in various operations) Facundo, two particular changes I'd like you to OK are: - _fix now indents the second call to _fixexponents to be inside the if block (in CVS, _fix_exponents could get called again directly on the result of the previous call to _fix_exponents, which didn't seem to make any sense) - _fixexponents returns immediately for NaN as well as infinite values This version of patch appears to give around a 40% speedup on direct execution of test_decimal.py, and all the tests still pass for me. |
|
|
msg46837 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2004-09-16 12:17 |
Logged In: YES user_id=1038590 I've separated the performance enhancements out into four different potential patches now: - "_odds_and_ends" - "_is_special" - "_both" (combination of the above two patches) - "_workint" (also incorporates "_both") A single tar.gz file is attached which contains all those in .diff and .py format, as well as a text file detailing the changes made in each patch. All diff's are directly against CVS. test_decimal.py passes cleanly with all four patches. They're listed above in order of increasing degree of change from current CVS. Hopefully I've gotten rid of any doc string changes, and non-performance related code changes (some comments have changed, usually because associated code has changed) The _workint version is the only version which regularly achieves < 20 s wall clock time on my machine for the unit tests and < 60 s for "telco_nofiles.py 10000" (that is, one hundred thousand decimal values). The _both version seems to run around 5 seconds slower on both tests (I am assuming this is due to the fact that _workint speeds up division significantly more than it speeds up addition and multiplication, thus having a much more pronounced effect on the unit tests than on the division-free telco benchmark) Here are the details on the different patches: decimal_odds_and_ends: - cache the result of repeated method calls in: __cmp__ _rescale _check_nans _fix_exponents - use sum() in __nonzero__ - call list.reverse() on preliminary result, not operands in __add__ - avoid redundant _fix_exponents call in _fix - rearrange _fix_exponents to minimise method calls on any given path - remove dead code from end of __int__ decimal_is_special: - convert to use __new__ instead of __init__ - new attribute _is_special is True if value is infinite or NaN - defer invocation of getcontext() as late as possible - make _convert_other a function instead of a method - modify all methods to take advantage of _is_special decimal_both: - combines decimal_is_special & decimal_odds_and_ends decimal_workint: - based on decimal_both - uses a numeric value for _WorkRep.int instead of a list - modifies __new__, __add__ and _divide accordingly - modifies __mul__ to use a _WorkRep (Sorry Raymond, I couldn't quite resist experimenting with borrowing int/long's C level arithmetic implementation!) I'll see about getting some wall-clock numbers for the *actual* million-value telco benchmark with all 5 versions (CVS + the four patches). |
|
|
msg46838 - (view) |
Author: Facundo Batista (facundobatista) *  |
Date: 2004-09-17 01:50 |
Logged In: YES user_id=752496 Raymond, Nick: I only reviewed the "both" version. From the patch text, the changes were pretty logical, but they were a *lot*, so I didn't want to extend the change to "_workint" version. It'll always be time. The majority of changes are because of the added attribute "_is_special", which shortcuts in an obvious manner a lot of code, and also makes possibly some interesting conceptual changes (and some other changes were not simple and implied good knowledge of the code and decimal internals). There're also a lot of changes such as executing once a function, binding its result in a name and using that name several times later (which effectively reduced the quantity of function calls). And trimmed several "getcontext()" when the only thing that was done was passing context to other function (not used there). The changed decimal was OK in the tests, showing a 14% speed up. I'm +1 to commit the "both" version. Checking the code, I saw some other optimizations, but I don't want to introduce new changes without commiting these first. Nick, congratulations, you certainly made a good work! (btw, don't upload files with spaces in the name) Raymond, do you want to commit the code or should I do it? . Facundo |
|
|
msg46839 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2004-09-17 03:37 |
Logged In: YES user_id=80475 I'll look at it in the next few day. |
|
|
msg46840 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2004-09-19 01:54 |
Logged In: YES user_id=80475 Checking in Nick's best version as Lib/decimal.py 1.24 Made a minor modification to the workrep __init__ code. Facunodo, feel free to checkin your own optimizations. Make sure they are clear both in concept and code. Also, make sure they show measurable speedups in both telco and the test suite. |
|
|