[Python-Dev] Re: Be Honest about LC_NUMERIC [REPOST] (original) (raw)

Guido van Rossum guido at python.org
Mon Sep 1 18:03:29 EDT 2003


[Tim]

In short, I can't be enthusiastic about the patch because it doesn't solve the only relevant locale problem I've actually run into. I understand that it may well solve many I haven't run into.

At this point in your life, Tim, is there any patch you could be truly enthusiastic about? :-)

I'm asking because I'd like to see the specific problem that started this thread solved, if necessary using a compromise that means the solution isn't perfect. I'm even willing to take a step back in the status quo, given that the status quo isn't perfect anyway, and that compromises mean something has to give.

Maybe the right solution is that we have to accept a hard-to-understand overcomplicated piece of code that we don't know how to maintain (but for which the author asserts that we won't have to do much maintenance in the foreseeable future). But maybe there's a simpler solution.

OTOH, the specific problem I'm acutely worried about would be better addressed by changing the way Python marhals float values.

So solve it. The approach used by binary pickles seems entirely reasonable. All we need to do is change the .pyc magic number. (There's undoubtedly user code in the world that would break because it requires interoperability between Python versions. So let the marshal module grow a way to specify the format.)

[Guido] > Maybe at least we can detect platforms for which we know there is a > native conversion in the library, and not use the hack on those?

I rarely find that piles of conditionalized code are more comprehensible or reliable; they usually result in mysterious x-platform differences, and become messier over time as we stumble into more platform library bugs, quirks, and limitations.

Fair enough. So if we decide to use the donated conversion code, we should start by using it unconditionally. I predict that at some point in the future we'll find a platform whose quirks are not handled by the donated code, and where it's simpler to use a correct native equivalent than to try to fix the donated code; but I expect that point to be pretty far in the future, or the platform to be pretty far from the main stream.

> ... > Here's yet another idea (which probably has flaws as well): instead of > substituting the locale's decimal separator, rewrite strings like > "3.14" as "314e-2" and rewrite strings like "3.14e5" as "314e3", then > pass to strtod(), which assigns the same meaning to such strings in > all locales.

This is a harder transformation than s/./localedecimalpoint/. It does address the thread-safety issue. Numerically it's flaky, as only a perfectly-rounding string->float routine can guarantee to return bit-for-bit identical results given equivalent (viewed as infinite precision) decimal representations as inputs, and few platform string->float routines do perfect rounding. > This removes the question of what decimal separator is used by the > locale completely, and thus removes the last bit of thread-unsafety > from the code. However, I don't know if underflow can cause the result > to be different, e.g. perhaps 1.23eX could be computed but 123e(X-2) > could not??? (Sounds pretty unlikely on the face of it since I'd expect > any decent conversion algorithm to pretty much break its input down into > a string of digits and an exponent, but I've never actually studied > such algorithms in detail.) Each library is likely fail in its own unique ways. Here's a cute one: """ base = 1.2345678901234567 digits = "12345678901234567" for exponent in range(-16, -15000, -1): string = digits + "0" * (-16 - exponent) string += "e%d" % exponent derived = float(string) assert base == derived, (string, derived) """ On Windows, this first fails at exponent -5202, where float(string) delivers a result a factor of 10 too large. I was surprised it did that well! Under Cygwin Python 2.2.3, it consumed > 14 minutes of CPU time, but never failed. I believe they're using a derivative of David Gay's excruciatingly complex IEEE-754 perfect-rounding string<->float routines (which would explain both why it didn't fail and why it consumed enormous CPU time; the code is excruciatingly complex because it does perfect rounding quickly for "normal" inputs, via a large variety of delicate speed tricks; when those tricks don't apply, it has to simulate unbounded-precision arithmetic to guarantee perfect rounding).

I fail to see the relevance of the example to my proposed hack, except as a proof that the world isn't perfect -- but we already know that. Under my proposal, the number of digits converted would never change, so any sensitivity of the algorithm used to the number of digits converted would be irrelevant. I note that the strtod.c code that's currently in the Python source tree uses a similar (though opposite) trick: it converts the number to the form 0.E before handing it off to atof(). So my proposal still stands. I'm happy to entertain a proof that it's flawed but not one where the flawed input has over 5000 digits and depends on a flaw in the platform routines.

--Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list