Issue 7606: test_xmlrpc fails with non-ascii path (original) (raw)

Issue7606

Created on 2009-12-30 21:46 by pitrou, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
xmlrpc_server_ascii_traceback.patch vstinner,2010-01-31 02:31
Messages (13)
msg97063 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-12-30 21:46
I configured my buildbot to use a non-ascii path to the interpreter and test_xmlrpc fails as follows: ---------------------------------------- Exception happened during processing of request from ('127.0.0.1', 59091) Traceback (most recent call last): File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/xmlrpc/server.py", line 448, in do_POST size_remaining = int(self.headers["content-length"]) ValueError: invalid literal for int() with base 10: 'I am broken' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py", line 281, in _handle_request_noblock self.process_request(request, client_address) File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py", line 307, in process_request self.finish_request(request, client_address) File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py", line 320, in finish_request self.RequestHandlerClass(request, client_address, self) File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/socketserver.py", line 614, in __init__ self.handle() File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py", line 352, in handle self.handle_one_request() File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py", line 346, in handle_one_request method() File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/xmlrpc/server.py", line 472, in do_POST self.send_header("X-traceback", traceback.format_exc()) File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/http/server.py", line 410, in send_header self.wfile.write(("%s: %s\r\n" % (keyword, value)).encode('ASCII', 'strict')) UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in position 93: ordinal not in range(128) ---------------------------------------- ====================================================================== FAIL: test_fail_with_info (test.test_xmlrpc.FailingServerTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/test/test_xmlrpc.py", line 555, in test_fail_with_info p.pow(6,8) xmlrpc.client.ProtocolError: <ProtocolError for 127.0.0.1:57828/RPC2: 500 Internal Server Error> During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/buildbot/cpython-ucs4-nonascii-€/3.1.pitrou-ubuntu-wide/build/Lib/test/test_xmlrpc.py", line 562, in test_fail_with_info self.assertTrue(e.headers.get("X-traceback") is not None) AssertionError: False is not True ----------------------------------------------------------------------
msg97064 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-12-30 22:03
> self.send_header("X-traceback", traceback.format_exc()) That's fairly tricky. send_header expects two strings (bytes are not acceptable), and also requires these strings to be ASCII. This is why it breaks: format_exc returns a non-ASCII string. I see two options: a) allow non-Unicode values for keyword and value in send_header, and have xmlrpc.server encode the header itself, or b) properly MIME-encode value if it contains non-ASCII characters (keyword really must be ASCII, I think). Not sure whether there is any precedence for UTF-8 in HTTP headers.
msg97068 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-12-30 23:30
A little googling came up with this page: http://publib.boulder.ibm.com/infocenter/tivihelp/v2r1/topic/com.ibm.itame.doc/am61_webseal_admin570.htm Their solution is to uri encode the UTF8 encoded data. However, this article references the RFCs, which look like they call for rfc2047 (MIME) encoded words: http://stackoverflow.com/questions/324470/http-headers-encoding-decoding-in-java
msg97069 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-12-30 23:38
If it's only about transmitting the string representation of the traceback, perhaps we can simply use "replace" or "ignore" as the error handler?
msg97071 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-12-30 23:49
David: I think it's a little bit more complicated. RFC 2616 says that the value of a header is *TEXT, which is defined as The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO-8859-1 only when encoded according to the rules of RFC 2047 So I think send_header should change in the following way: a) if isinstance(value, bytes): send value as-is b) if value can be encoded in latin-1: encode in latin-1, then send as-is c) otherwise: MIME-encode as UTF-8, using the following algorithm 1. count the number of non-ascii characters, by encoding with ascii, ignore, and comparing result lengths 2. if there are less than 10% non-ascii character, use the Q encoding 3. otherwise, use the B encoding The purpose of the algorithm in c) would be that text containing a few non-latin characters still comes out right even if the receiver fails to decode the header. The same change would also apply to the client-side of sending headers. On the receiving side, we should offer an option to decode headers (both for client and server); this should be an option because senders may not comply with RFC 2616. Reading should then proceed as follows: 1. check whether there are MIME markers in the text 2. if so, MIME-decode 3. if not, decode as latin-1
msg97072 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-12-30 23:51
Antoine: sure, to fix the issue at hand, we can work-around. However, the issue of sending non-ASCII headers in HTTP remains, and should also be fixed.
msg98593 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-01-31 02:31
#7608 was a duplicate issue. Copy of my message (): ----- SimpleXMLRPCRequestHandler.do_POST() writes the traceback in the HTTP header "X-traceback". But an HTTP header value is ASCII only, whereas a traceback can contain any character (eg. an non-ASCII character from a directory name for this issue). A simple fix would be to use the ASCII charset with the backslashreplace error handler. Attached patch uses: trace = str(trace.encode('ASCII', 'backslashreplace'), 'ASCII') Is there an easier method to escape non-ASCII characters without double conversion (unicode->bytes and bytes->unicode)? ----- I also copied my patch to this issue.
msg98594 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-01-31 02:39
pitrou> If it's only about transmitting the string representation of the pitrou> traceback, perhaps we can simply use "replace" or "ignore" as the error pitrou> handler? Both replace and ignore loose information. My patch keeps all information by using backslashreplace. It's consistent with Python behaviour: Python writes a backtrace to stderr which uses also the backslashreplace error handler.
msg103275 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-15 23:20
What do you think about my solution (convert the traceback to ASCII to avoid the encoding issue)? If you would like to support non-ASCII characters in HTTP headers, you should open a new issue. For the compatibility, I prefer to use pure ASCII headers because I fear that third party programs doesn't support non-ASCII headers.
msg103322 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-16 13:27
> What do you think about my solution (convert the traceback to ASCII to > avoid the encoding issue)? It's fine for me. Perhaps you should add a comment to explain why this is necessary.
msg103323 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-16 13:28
Commited: r80112 (py3k). Waiting for the buildbots before te backport to 3.1.
msg103335 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-16 15:48
> Commited: r80112 (py3k) Looks good: r80118 (3.1).
msg103382 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-17 00:35
If anyone would like to work on non-ASCII HTTP header, please open a new issue with a pointer to this one.
History
Date User Action Args
2022-04-11 14:56:55 admin set github: 51855
2010-04-17 00:35:53 vstinner set messages: +
2010-04-16 15:48:46 vstinner set status: open -> closedresolution: fixedmessages: +
2010-04-16 13:28:37 vstinner set messages: +
2010-04-16 13:27:50 pitrou set messages: +
2010-04-15 23:20:24 vstinner set messages: +
2010-04-13 23:37:47 vstinner link issue8242 dependencies
2010-02-27 14:43:50 flox set nosy: + flox
2010-01-31 02:39:27 vstinner set messages: +
2010-01-31 02:31:06 vstinner set files: + xmlrpc_server_ascii_traceback.patchnosy: + vstinnermessages: + keywords: + patch
2009-12-30 23:51:05 loewis set messages: +
2009-12-30 23:49:24 loewis set messages: +
2009-12-30 23:38:03 pitrou set messages: +
2009-12-30 23:30:32 r.david.murray set nosy: + r.david.murraymessages: +
2009-12-30 22:03:05 loewis set messages: +
2009-12-30 21:46:35 pitrou create