[Python-Dev] bytes (original) (raw)
[Python-Dev] bytes / unicode
Terry Reedy tjreedy at udel.edu
Mon Jun 21 19:27:30 CEST 2010
- Previous message: [Python-Dev] bytes / unicode
- Next message: [Python-Dev] bytes / unicode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 6/20/2010 11:56 PM, Terry Reedy wrote:
The specific example is
>>> urllib.parse.parseqsl('a=b%e0') [('a', 'b�')] where the character after 'b' is white ? in dark diamond, indicating an error. parseqsl() splits that input on '=' and sends each piece to urllib.parse.unquote unquote() attempts to "Replace %xx escapes by their single-character equivalent.". unquote has an encoding parameter that defaults to 'utf-8' in its call to .decode. parseqsl does not have an encoding parameter. If it did, and it passed that to unquote, then the above example would become (simulated interaction) >>> urllib.parse.parseqsl('a=b%e0', encoding='latin-1') [('a', 'bà')] I got that output by copying the file and adding "encoding-'latin-1'" to the unquote call. Does this solve this problem? Has anything like this been added for 3.2? Should it be?
With a little searching, I found http://bugs.python.org/issue5468 with Miles Kaufmann's year-old comment "parse_qs and parse_qsl should also grow encoding and errors parameters to pass to the underlying unquote()". Patch review is needed.
Terry Jan Reedy
- Previous message: [Python-Dev] bytes / unicode
- Next message: [Python-Dev] bytes / unicode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]