[Python-Dev] Python 3.0 urllib fails with chunked HTTP responses (original) (raw)

Guido van Rossum guido at python.org
Thu Dec 18 18:27:42 CET 2008


It sounds like the self-closing is an implementation detail, meant to make sure the socket is closed as early as possible (which I suppose is a good thing if there's a server waiting for the final ACK on the other side). Perhaps it should not use close() but something slightly lower level that affects the socket directly?

--Guido van Rossum (home page: http://www.python.org/~guido/)

On Thu, Dec 18, 2008 at 5:22 AM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:

On Wed, Dec 17, 2008 at 1:05 PM, Guido van Rossum <guido at python.org> wrote:

The inheritance from io.RawIOBase seems fine. There is a small problem with the interaction between HTTPResponse and RawIOBase, but I think the problem is more on the http side. You may recall that the HTTP code has a habit of closing the connection for you. In a variety of cases, once you've read the last bytes of the response, the HTTPResponse object calls its own close() method. This interacts poorly with RawIOBase, because it raises a ValueError for any operation on a closed io object. This prevents iterators from working correctly. The iterator implementation expects the final call to readline() to return an empty string and converts that to a StopIteration. Instead, it's seeing a ValueError that propagates out. It's always been odd to me that the connection closed itself. It's going to be tricky to fix the current bug (chunked responses) and keep the self-closing behavior, but I worry that change the self-closing behavior too dramatically isn't appropriate for a bug fix. Will look some more at this tomorrow. Jeremy

--Guido van Rossum (home page: http://www.python.org/~guido/)

On Mon, Dec 15, 2008 at 11:19 AM, Jeremy Hylton <jeremy at alum.mit.edu> wrote: I have a patch that appears to fix this bug http://bugs.python.org/file12361/urllib-chunked.diff but I'm not sure about its interaction with the io module and RawIOBase. Is there a new IO expert who could take a look at it for me? Jeremy On Sun, Dec 14, 2008 at 11:06 PM, Jeremy Hylton <jeremy at alum.mit.edu> wrote: This bug is pretty serious, because urllib will insert garbage into the application-visible data for a chunked response. It simply ignores the fact that it's reading a chunked response and includes the chunked header data is payload data. The original bug was reported in September, but no one noticed it. It was reported again recently. http://bugs.python.org/issue3761 http://bugs.python.org/issue4631 I suspect we'd want to get a 3.0.1 out as soon as this is fixed, but that's not my call. Jeremy


Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org



More information about the Python-Dev mailing list