Issue 1040: Unicode problem with TZ (original) (raw)

Issue1040

Created on 2007-08-28 06:05 by theller, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (5)
msg55351 - (view) Author: Thomas Heller (theller) * (Python committer) Date: 2007-08-28 06:05
In my german version of winXP SP2, python3 cannot import the time module: c:\svn\py3k\PCbuild>python_d Python 3.0x (py3k:57600M, Aug 28 2007, 07:58:23) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import time Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'utf8' codec can't decode bytes in position 9-11: invalid data [36719 refs] >>> ^Z The problem is that the libc '_tzname' variable contains umlauts. For comparison, here is what Python2.5 does: c:\svn\py3k\PCbuild>\python25\python Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> time.tzname ('Westeurop\xe4ische Normalzeit', 'Westeurop\xe4ische Normalzeit') >>>
msg55352 - (view) Author: Thomas Heller (theller) * (Python committer) Date: 2007-08-28 06:06
BTW, setting the environment variable TZ to, say, 'GMT' makes the problem go away.
msg55426 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2007-08-29 17:06
I have a patch for this, which uses MBCS conversion instead of relying on the default utf-8 (here and several other places). Tested on a French version of winXP. Which leads me to the question: should Windows use MBCS encoding by default when converting between char* and PyUnicode, and not utf-8? There are some other tracker items which would benefit from this. After all, C strings can only come from 1) python code, 2) system i/o and messages, and 3) constants in source code. IMO, 1) can use the representation it prefers, 2) would clearly lead to less error if handled as MBCS and 3) only uses 7bit ascii. There is very little need for utf-8 here.
msg55427 - (view) Author: Thomas Heller (theller) * (Python committer) Date: 2007-08-29 18:01
IMO the very best would be to avoid as many conversions as possible by using the wide apis on Windows. Not for _tzname maybe, but for env vars, sys.argv, sys.path, and so on. Not that I would have time to work on that...
msg55481 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-08-30 14:40
This is now fixed in r57720. Using wide APIs would be possible through GetTimeZoneInformation, however, then TZ won't be supported anymore (unless the CRT code to parse TZ is duplicated).
History
Date User Action Args
2022-04-11 14:56:26 admin set github: 45381
2007-08-30 14:40:27 loewis set status: open -> closednosy: + loewisresolution: fixedmessages: +
2007-08-29 18:01:21 theller set messages: +
2007-08-29 17:06:03 amaury.forgeotdarc set nosy: + amaury.forgeotdarcmessages: +
2007-08-29 16:50:51 loewis set assignee: loewis
2007-08-28 06:06:52 theller set messages: +
2007-08-28 06:05:53 theller create