| msg97063 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2009-12-30 21:46 |
| I configured my buildbot to use a non-ascii path to the interpreter and test_xmlrpc fails as follows: ---------------------------------------- Exception happened during processing of request from ('127.0.0.1', 59091) Traceback (most recent call last): File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/xmlrpc/server.py", line 448, in do_POST size_remaining = int(self.headers["content-length"]) ValueError: invalid literal for int() with base 10: 'I am broken' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py", line 281, in _handle_request_noblock self.process_request(request, client_address) File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py", line 307, in process_request self.finish_request(request, client_address) File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py", line 320, in finish_request self.RequestHandlerClass(request, client_address, self) File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py", line 614, in __init__ self.handle() File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py", line 352, in handle self.handle_one_request() File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py", line 346, in handle_one_request method() File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/xmlrpc/server.py", line 472, in do_POST self.send_header("X-traceback", traceback.format_exc()) File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py", line 410, in send_header self.wfile.write(("%s: %s\r\n" % (keyword, value)).encode('ASCII', 'strict')) UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 93: ordinal not in range(128) ---------------------------------------- ====================================================================== FAIL: test_fail_with_info (test.test_xmlrpc.FailingServerTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/test/test_xmlrpc.py", line 555, in test_fail_with_info p.pow(6,8) xmlrpc.client.ProtocolError: <ProtocolError for 127.0.0.1:57828/RPC2: 500 Internal Server Error> During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/test/test_xmlrpc.py", line 562, in test_fail_with_info self.assertTrue(e.headers.get("X-traceback") is not None) AssertionError: False is not True ---------------------------------------------------------------------- |
|
|
| msg97064 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2009-12-30 22:03 |
| > self.send_header("X-traceback", traceback.format_exc()) That's fairly tricky. send_header expects two strings (bytes are not acceptable), and also requires these strings to be ASCII. This is why it breaks: format_exc returns a non-ASCII string. I see two options: a) allow non-Unicode values for keyword and value in send_header, and have xmlrpc.server encode the header itself, or b) properly MIME-encode value if it contains non-ASCII characters (keyword really must be ASCII, I think). Not sure whether there is any precedence for UTF-8 in HTTP headers. |
|
|
| msg97068 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2009-12-30 23:30 |
| A little googling came up with this page: http://publib.boulder.ibm.com/infocenter/tivihelp/v2r1/topic/com.ibm.itame.doc/am61_webseal_admin570.htm Their solution is to uri encode the UTF8 encoded data. However, this article references the RFCs, which look like they call for rfc2047 (MIME) encoded words: http://stackoverflow.com/questions/324470/http-headers-encoding-decoding-in-java |
|
|
| msg97069 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2009-12-30 23:38 |
| If it's only about transmitting the string representation of the traceback, perhaps we can simply use "replace" or "ignore" as the error handler? |
|
|
| msg97071 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2009-12-30 23:49 |
| David: I think it's a little bit more complicated. RFC 2616 says that the value of a header is *TEXT, which is defined as The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO-8859-1 only when encoded according to the rules of RFC 2047 So I think send_header should change in the following way: a) if isinstance(value, bytes): send value as-is b) if value can be encoded in latin-1: encode in latin-1, then send as-is c) otherwise: MIME-encode as UTF-8, using the following algorithm 1. count the number of non-ascii characters, by encoding with ascii, ignore, and comparing result lengths 2. if there are less than 10% non-ascii character, use the Q encoding 3. otherwise, use the B encoding The purpose of the algorithm in c) would be that text containing a few non-latin characters still comes out right even if the receiver fails to decode the header. The same change would also apply to the client-side of sending headers. On the receiving side, we should offer an option to decode headers (both for client and server); this should be an option because senders may not comply with RFC 2616. Reading should then proceed as follows: 1. check whether there are MIME markers in the text 2. if so, MIME-decode 3. if not, decode as latin-1 |
|
|
| msg97072 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2009-12-30 23:51 |
| Antoine: sure, to fix the issue at hand, we can work-around. However, the issue of sending non-ASCII headers in HTTP remains, and should also be fixed. |
|
|
| msg98593 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-01-31 02:31 |
| #7608 was a duplicate issue. Copy of my message (): ----- SimpleXMLRPCRequestHandler.do_POST() writes the traceback in the HTTP header "X-traceback". But an HTTP header value is ASCII only, whereas a traceback can contain any character (eg. an non-ASCII character from a directory name for this issue). A simple fix would be to use the ASCII charset with the backslashreplace error handler. Attached patch uses: trace = str(trace.encode('ASCII', 'backslashreplace'), 'ASCII') Is there an easier method to escape non-ASCII characters without double conversion (unicode->bytes and bytes->unicode)? ----- I also copied my patch to this issue. |
|
|
| msg98594 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-01-31 02:39 |
| pitrou> If it's only about transmitting the string representation of the pitrou> traceback, perhaps we can simply use "replace" or "ignore" as the error pitrou> handler? Both replace and ignore loose information. My patch keeps all information by using backslashreplace. It's consistent with Python behaviour: Python writes a backtrace to stderr which uses also the backslashreplace error handler. |
|
|
| msg103275 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-04-15 23:20 |
| What do you think about my solution (convert the traceback to ASCII to avoid the encoding issue)? If you would like to support non-ASCII characters in HTTP headers, you should open a new issue. For the compatibility, I prefer to use pure ASCII headers because I fear that third party programs doesn't support non-ASCII headers. |
|
|
| msg103322 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2010-04-16 13:27 |
| > What do you think about my solution (convert the traceback to ASCII to > avoid the encoding issue)? It's fine for me. Perhaps you should add a comment to explain why this is necessary. |
|
|
| msg103323 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-04-16 13:28 |
| Commited: r80112 (py3k). Waiting for the buildbots before te backport to 3.1. |
|
|
| msg103335 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-04-16 15:48 |
| > Commited: r80112 (py3k) Looks good: r80118 (3.1). |
|
|
| msg103382 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-04-17 00:35 |
| If anyone would like to work on non-ASCII HTTP header, please open a new issue with a pointer to this one. |
|
|