[Python-Dev] EuroPython Language Summit report (original) (raw)
Victor Stinner victor.stinner at haypocalc.com
Fri Jun 24 23:08:18 CEST 2011
- Previous message: [Python-Dev] EuroPython Language Summit report
- Next message: [Python-Dev] EuroPython Language Summit report
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Le vendredi 24 juin 2011 à 16:30 -0400, Terry Reedy a écrit :
> I see two options to improve the situation.
The third is to make utf-8 the default. I believe this is the proper long term solution and both options are contrary to this.
Oh yes, I also prefer this option, but I suspect that some people prefer to not break backward compatibility.
Or should we consider this bad design choice as a bug?
The UTF-8 encoder (of Python 3) raises an error if the text contains a surrogate character. The surrogatepass error handler should be used to encode surrogages.
... The surrogateescape can be used to encode back undecodable bytes (e.g. filename decoded by Python using the surrogateescape), but it is not a good idea to write illegal byte sequences (most programs will fail to open the file).
I believe that this is what I want for myself even on Windows.
Can you open a UTF-8 files in all Windows program (and the text is displayed correctly)? I remember that notepad.exe writes an evil UTF-8 BOM, but I don't know if it requires this BOM to detect the UTF-8 encoding.
Or do some program expect text files encoded to the ANSI code page?
If you want to write files in the ANSI code page, you can use encoding="mbcs" (or use an explicit code page, like encoding="cp1252").
(3) In 3.3, if the default is used and it is not utf-8, add a warning that the default will become utf-8 always in 3.4. Actually, I would like a PendingDeprecationWarning in 3.2.1 if possible.
I'm not sure that the "and it is not utf-8" condition is a good idea. If you develop on Linux but your users are on Windows, you will not get the warning (even with -Werror) nor your users (users don't want to see warnings)... Or maybe an user using Windows and Linux will notice the bug without the warning :-)
It doesn't mean that it is not possible to check your program: you can change your locale encoding (e.g. use LANG=C).
At least, it will be possible to check test_distutils and test_packaging using LANG=C and -Werror :-)
--
A fourth option is to use ASCII by default! Your program will work and be portable until you write the first non-ASCII character... Oh wait, it remembers me the Unicode nightmare of Python 2!
Victor
- Previous message: [Python-Dev] EuroPython Language Summit report
- Next message: [Python-Dev] EuroPython Language Summit report
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]