Issue 9655: urllib2 fails to retrieve a url which is handled correctly by urllib (original) (raw)

Created on 2010-08-21 10:27 by Albert.Weichselbraun, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (5)
msg114482 - (view) Author: Albert Weichselbraun (Albert.Weichselbraun) Date: 2010-08-21 10:27
urllib2 fails to retrieve the content of http://www.mfsa.com.mt/insguide/english/glossarysearch.jsp?letter=all >>> urllib2.urlopen("http://www.mfsa.com.mt/insguide/english/glossarysearch.jsp?letter=all").read() '' urllib handles the same link correctly: >>> len( urllib.urlopen("http://www.mfsa.com.mt/insguide/english/glossarysearch.jsp?letter=all").read() ) 56105
msg114483 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-08-21 10:53
Its funny, confirmed the problem in the trunk.
msg114486 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-08-21 11:56
Hmm, it looks like a web server problem to me. urllib2 uses the HTTP/1.1 protocol, and sends the "Connection: close" header. I hacked urllib2: when this header is not sent, the content is retrieved normally. This page: http://www.mail-archive.com/users@tomcat.apache.org/msg28684.html describes the same problem. The web site above does use Tomcat (can be seen in the response headers), maybe they have a wrong version?
msg114487 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-08-21 12:05
Confirmed with telnet sessions: == Simulate "urllib2" == $ telnet www.mfsa.com.mt 80 GET /insguide/english/glossarysearch.jsp?letter=all HTTP/1.1 Accept-Encoding: identity Host: www.mfsa.com.mt Connection: close User-Agent: Python-urllib/2.7 HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Set-Cookie: JSESSIONID=D34D395A7654B6532F6F6DFF81FC91C3; Path=/insguide Content-Type: text/html Date: Sat, 21 Aug 2010 11:54:25 GMT Connection: close Connection closed by foreign host. $ == Simulate "urllib" == GET /insguide/english/glossarysearch.jsp?letter=all HTTP/1.0 Host: www.mfsa.com.mt User-Agent: Python-urllib/1.17 HTTP/1.1 200 OK Server: Apache-Coyote/1.1 Set-Cookie: JSESSIONID=84D9D8DF76546751908F388D8889BB47; Path=/insguide Content-Type: text/html Transfer-Encoding: chunked Date: Sat, 21 Aug 2010 11:54:06 GMT 400 ... $
msg114502 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-08-21 15:33
Thanks Amaury, that was nice debugging. The problem is with Apache tomcat server at the remote end, which is misbehaving on Connection:close header being sent by urllib2. We can't do anything about it, the bug reporter can take it up with server. However, in the Urllib2 documentation, if needed, it can be mentioned that urllib2 is sending Connection:close while using HTTP/1.1 whereas urllib uses HTTP/1.0. Closing this bug as Invalid.
History
Date User Action Args
2022-04-11 14:57:05 admin set github: 53864
2010-08-21 15:33:18 orsenthil set status: open -> closedresolution: accepted -> not a bugmessages: + stage: needs patch -> resolved
2010-08-21 12:05:14 flox set nosy: + floxmessages: + versions: + Python 2.7
2010-08-21 11:56:36 amaury.forgeotdarc set nosy: + amaury.forgeotdarcmessages: +
2010-08-21 10:53:40 orsenthil set nosy: + orsenthilmessages: + assignee: orsenthilresolution: acceptedstage: needs patch
2010-08-21 10:27:32 Albert.Weichselbraun create