Issue 1721241: code that writes the PKG-INFO file doesnt handle unicode (original) (raw)

[forwarded from http://bugs.debian.org/422604 ]

http://www.mail-archive.com/distutils-sig@python.org/msg03007.html

Analysis from Phillip J. Eby:

That's not the problem, it's that the code that writes the PKG-INFO file doesn't handle Unicode. See distutils.dist.DistributionMetadata.write_pkg_info(). It needs to use a file with encoding support, if it's doing unicode

However, there's currently no standard, as far as I know, for what encoding the PKG-INFO file should use. Meanwhile, the 'register' command accepts Unicode, but is broken in handling it.

Essentially, the problem is that Python 2.5 broke this by adding a unicode requirement to the "register" command. Previously, register simply sent whatever you gave it, and the PKG-INFO writing code still does. Unfortunately, this means that there is no longer any one value that you can use for your name that will be accepted by both "register" and anything that writes a PKG-INFO file.

Both register and write_pkg_info() are arguably broken here, and should be able to work with either strings or unicode, and degrade gracefully in the event of non-ASCII characters in a string. (Because even though "register" is only run by the package's author, users may run other commands that require a PKG-INFO, so a package prepared using Python <2.5 must still be usable with Python 2.5 distutils, and Python <2.5 allows 8-bit maintainer names.)

Unfortunately, this isn't fixable until there's a new 2.5.x release. For previous Python versions, both register and write_pkg_info() accepted 8-bit strings and passed them on as-is, so the only workaround for this issue at the moment is to revert to Python 2.4 or less.