[Python-Dev] Numerical robustness, IEEE etc. (original) (raw)
Nick Maclaren nmm1 at cus.cam.ac.uk
Mon Jun 19 10:55:44 CEST 2006
- Previous message: [Python-Dev] TRUNK FREEZE IMMINENT FOR 2.5 BETA 1 - 00:00 UTC, 20-JUNE-2006
- Next message: [Python-Dev] Python 2.4 extensions require VC 7.1?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Brett Cannon's and Neal Norwitz's replies appreciated and noted, but responses sent by mail.
Nick Coghlan <ncoghlan at gmail.com> wrote:
Python 2.4's decimal module is, in essence, a floating point emulator based on the General Decimal Arithmetic specification.
Grrk. Format and all? Because, in software, encoding, decoding and dealing with the special cases accounts for the vast majority of the time. Using a format and specification designed for implementation in software is a LOT faster (often 5-20 times).
If you want floating point mathematics that doesn't have insane platform dependent behaviour, the decimal module is the recommended approach. By the time Python 2.6 rolls around, we will hopefully have an optimized version implemented in C (that's being worked on already).
Yes. There is no point in building a wheel if someone else is doing it. Please pass my name on to the people doing the optimisation, as I have a lot of experience in this area and may be able to help. But it is a fairly straightforward (if tricky) task.
That said, I'm not clear on exactly what changes you'd like to make to the binary floating point type, so I don't know if I think they're a good idea or not :)
Now, here it is worth posting a reponse :-)
The current behaviour follows C99 (sic) with some extra checking (e.g. division by zero raises an exception). However, this means that a LOT of errors will give nonsense answers without comment, and there are a lot of ways to 'lose' NaN values quietly - e.g. int(NaN). That is NOT good software engineering. So:
Mode A: follow IEEE 754R slavishly, if and when it ever gets into print. There is no point in following C99, as it is too ill-defined, even if it were felt desirable. This should not be the default, because of the flaws I mention above (see Kahan on Java).
Mode B: all numerically ambiguous or invalid operations should raise an exception - including pow(0,0), int(NaN) etc. etc. There is a moot point over whether overflow is such a case in an arithmetic that has infinities, but let's skip over that one for now.
Mode C: all numerically ambiguous or invalid operations should return a NaN (or infinity, if appropriate). Anything that would lose the error indication would raise an exception. The selection between modes B and C could be done by a method on the class - with mode B being selected if any argument had it set, and mode C otherwise.
Now, both modes B and C are traditional approaches to numerical safety, and have the property that error indications can't be lost "by accident", though they make no guarantees that the answers make sense. I am agnostic about which is better, though mode B is a LOT better from the debugging point of view, as you discover an error closer to where it was made.
Heaven help us, there could be a mode D, which would be mode C but with trace buffers. They are another sadly neglected software engineering technique, but let's not add every bell and whistle on the shelf :-)
"tjreedy" <tjreedy at udel.edu> wrote:
> experience from times of yore is that emulated floating-point would > be fast enough that few, if any, Python users would notice. Perhaps you should enquire on the Python numerical and scientific computing lists to see how many feel differently. I don't see how someone crunching numbers hours per day could not notice a slowdown.
Oh, certainly, almost EVERYONE will "feel" differently! But that is not the point. Those few of us remaining (and there are damn few) who know how a fast emulated floating-point performs know that the common belief that it is very slow is wrong. I have both used and implemented it :-)
The point is, as I mention above, you MUST use a software-friendly format AND specification if you want performance. IEEE 754 and IBM's decimal pantechnichon are both extremely software-hostile.
Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1 at cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679
- Previous message: [Python-Dev] TRUNK FREEZE IMMINENT FOR 2.5 BETA 1 - 00:00 UTC, 20-JUNE-2006
- Next message: [Python-Dev] Python 2.4 extensions require VC 7.1?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]