[Python-Dev] Re: Be Honest about LC_NUMERIC [REPOST] (original) (raw)

James Henstridge james at daa.com.au
Mon Sep 1 15:59:32 EDT 2003


On 31/08/2003 9:25 AM, Tim Peters wrote:

[James Henstridge]

As Christian said, there is code in glib (not to be confused with glibc: the GNU C library) that could act as a basis for locale independent float conversion functions in Python.

[Martin v. Löwis] I very much doubt that this statement is true. Are you sure this code supports all the platforms where Python runs? E.g. what about the three (!) different floating point formats on a VAX? Well, you should look at the patch: it doesn't know anything about internal fp formats -- all conversions are performed in the end by calling the platform C's strtod() or snprintf(). What it does do is: 1. For string to double, preprocess the input string to change it to use current-locale spelling before calling the platform C strtod(). 2. For double to string, postprocess the result of the platform C snprintf() to replace current-locale spelling with a "standard" spelling. So this is much more string-munging code than it is floating-point code. Indeed, there doesn't appear to be a single floating-point operation in the entire patch (apart from calls to platform string<->float functions). OTOH, despite the claims, it doesn't look threadsafe to me: there's no guarantee, e.g., that the idea of current locale gasciistrtod() obtains from localeconv() at its start is still in effect by the time gasciistrtod() gets around to calling strtod(). This is true. However, in practice we found it fixed a number of thread safety issues in programs.

Your average localised package usually switches to the user's preferred locale on startup, so that it can display strings and messages, and occasionally wants to read/write numbers in a locale independent format (usually when saving/loading files). The most common way of doing this is the setlocale/strtod/setlocale combo, which has thread safety problems and possible reentrancy problems if done wrong.

The method used by g_ascii_strotod() removes the need to switch locale when parsing the float, which means that an application using it may only need to call setlocale() once on startup and never again. This seems to be the best way to use setlocale w.r.t. thread safety.

The existing locale handling in Python shares this property, but makes it difficult for external libraries to format and parse floats in the locale's representation. From what I can see, leaving LC_NUMERIC set to the locale value rather than "C" leads to better interoperability.

So at best it solves part of one relevant problem here (other relevant problems include that platform C libraries disagree about how to spell infinities, NaNs and signed zeroes; about how many digits to use for an exponent; and about how to round results (for example,

"%.1f" % 2.25

'2.3' on Windows, but most (not all!) flavors of Unix produce the IEEE-754 to-nearest/even rounded '2.2' instead)). It's easy to write portable, perfectly-rounding string<->double conversion routines without calling any platform functions. The rub is that "fast" goes out the window then, unless you give up at least one of {portable, accurate}. It would be great for Python to have consistent float parsing/formatting on every platform in the future. Making sure that every place where Python wants to parse or format a float in a locale independent fashion go through a single set of functions should make it easier to drop in a new set of routines in the future. However, getting rid of the LC_NUMERIC=C requirement would have real benefits today.

James.

-- Email: james at daa.com.au WWW: http://www.daa.com.au/~james/



More information about the Python-Dev mailing list