msg263043 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2016-04-08 20:48 |
Patch attached with test. In summary: A request to the url b'/\x80' appears to the application as a request to b'\xc2\x80' -- The issue being the latin1 decoded PATH_INFO is re-encoded as UTF-8 and then decoded as latin1 (on the wire) b'\x80' -(decode latin1)-> u'\x80' -(encode utf-8)-> b'\xc2\x80' -(decode latin1)-> b'\xc2\x80' My patch cuts out the encode(utf-8)->decode(latin1) |
|
|
msg263044 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2016-04-08 20:53 |
A few typos in my previous comment, pressed enter too quickly, here's an updated comment: Patch attached with test. In summary: A request to the url b'/\x80' appears to the application as a request to b'/\xc2\x80' -- The issue being the latin1 decoded PATH_INFO is re-encoded as UTF-8 and then decoded as latin1 (on the wire) b'\x80' -(decode latin1)-> u'\x80' -(encode utf-8)-> b'\xc2\x80' -(decode latin1)-> u'\xc2\x80' My patch cuts out the encode(utf-8)->decode(latin1): (on the wire) b'\x80' -(decode latin1) -> u'\x80' |
|
|
msg263048 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2016-04-08 22:34 |
Oops, broke b'/%80'. Here's a better fix that now takes: (on the wire) b'\x80' -(decode latin1)-> u'\x80' -(encode utf-8)-> b'\xc2\x80' -(decode latin1)-> u'\xc2\x80' to: (on the wire) b'\x80' -(decode latin1)-> u'\x80' -(encode latin1) -> b'\x80' -(decode latin1)-> u'\x80' |
|
|
msg263050 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-04-08 23:50 |
I was going to say your original fix was the reverse of a change in r86146. But you seem to be fixing the problems before I express them :) For the fix I would suggest something like unquote(path, "latin-1") would be simpler. I left some other review comments about the tests. |
|
|
msg263054 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2016-04-09 01:47 |
Updates after review. |
|
|
msg263055 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-04-09 02:41 |
Thanks, this version looks pretty good to me. |
|
|
msg263056 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2016-04-09 02:55 |
Forgot to remove the pyver code (leaning a bit too much on pre-commit) |
|
|
msg263596 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2016-04-17 03:04 |
New changeset 1f2cfcd5a83f by Martin Panter in branch '3.5': Issue #26717: Stop encoding Latin-1-ized WSGI paths with UTF-8 https://hg.python.org/cpython/rev/1f2cfcd5a83f New changeset 815a4ac67e68 by Martin Panter in branch 'default': Issue #26717: Merge wsgiref fix from 3.5 https://hg.python.org/cpython/rev/815a4ac67e68 |
|
|
msg263818 - (view) |
Author: Александр Эри (Александр Эри) |
Date: 2016-04-20 10:46 |
Why wsgiref uses latin1? It must use utf-8. |
|
|
msg263844 - (view) |
Author: Anthony Sottile (Anthony Sottile) * |
Date: 2016-04-20 14:34 |
PEP3333 states that environ variables are str variables decoded using latin1: https://www.python.org/dev/peps/pep-3333/#id19 Therefore, to get the original bytes, one must encode using latin1 On Apr 20, 2016 3:46 AM, "Александр Эри" <report@bugs.python.org> wrote: > > Александр Эри added the comment: > > Why wsgiref uses latin1? It must use utf-8. > > ---------- > keywords: +patch > nosy: +Александр Эри > Added file: http://bugs.python.org/file42531/simple_server.py.diff > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue26717> > _______________________________________ > |
|
|