[Python-Dev] New PyPI broken package editing (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Wed Mar 30 22:37:09 CEST 2005


Walter Dörwald wrote:

The register command in 2.4 (and current CVS) simply does a value = str(value) in posttoserver() so the encoded bytes sent depend on the default encoding. Would it be sufficient to change this to value = unicode(value).encode("utf-8")

Indeed. I think this can go into 2.4.2.

Another solution might be to include the encoding in the Content-type header of the request. IMHO the best solution would be to do both: Always use UTF-8 as the encoding and include this in the Content-type header in the request. PyPI should honor this encoding when it finds it and should fall back to whatever it used before if it doesn't.

Yeah, well :-) Content-type in form upload is a mess, as you certainly know. It should be honored, but commonly isn't. This, in turn, causes browsers to ignore it.

PyPI uses the CGI module. It currently decodes anything that doesn't have a filename attribute to UTF-8, causing rejection of anything that doesn't send UTF-8. This could be fixed/extended, but I think that would be best done in the CGI module, for consumption by any application that uses form upload. For example, doing

cgi.FieldStorage(..., encoding="UTF-8")

should cause

a) decoding of every field that has an encoding= in its content type b) decoding of every field that is not a file to UTF-8. It is a file if it I) has a filename, or II) cannot be decoded to the target decoding

For backwards compatibility, a) can only be enabled if the CGI application explicitly tells what encoding it expects.

I'd like to state "contributions are welcome", although others may think differently.

Regards, Martin



More information about the Python-Dev mailing list