[spambayes-dev] RE: [Python-Dev] RE: [Spambayes] Question (orpossibly a bug report) (original) (raw)

Tim Peters tim.one@comcast.net
Sun, 27 Jul 2003 16:54:42 -0400


[Mark Hammond]

This seems to be coming to a conclusion. Not a completely satisfactory one, but one nonetheless.

Short story for the python-dev crew: * Some Windows programs are known to run with the CRT locale set to other than "C" - specifically, set to the locale of the user. * If this happens, the marshal module will load floating point literals incorrectly.

Well, it depends on the locale, and on the fp literals in question, but it's often the case that damage occurs.

* Thus, once this happens, if a .pyc file is imported, the floating point literals in that .pyc are wrong. Confusion reigns.

Yup -- and it's an excellent to-the-point summary!

The "best" solution to this probably involves removing Python being dependent on the locale - there is even an existing patch for that.

Kinda.

To the SpamBayes specifics: ... I have a version working for the original bug reporter. While on our machines, we can reproduce the locale being switched at MAPILogon time, my instrumented version also shows that for some people at least, Outlook itself will also change it back some time before delivering UI events to us.

There's potentially another dark side to this story: if MS code is going out of its way to switch locale, it's presumably because some of MS's code wants to change the behavior of CRT routines to work "as expected" for the current user. So if we switch LC_NUMERIC back to "C", we may be creating problems for Outlook. I'll never stumble into this, since "C" locale and my normal locale are so similar (and have identical behavior in the LC_NUMERIC category). At least Win32's native notions of locale are settable on a per-thread basis; C's notion is a global hammer; it's unclear to me why MS's code is even touching C's notion.

... We do still have the "social" problem of what locale conventions to use for Config files, but that has nothing to do with our tools...

To the extent that Config files use Python syntax, they're least surprising if they stick to Python syntax. The locale pit is deep. For example, Finnish uses a non-breaking space to separate thousands "although fullstop may be used in monetary context". We'll end up with more code to cater to gratuitous locale differences than to identify spam <0.7 wink>.