Issue 8732: Should urllib2.urlopen send an Accept-Encoding header? (original) (raw)

Created on 2010-05-16 14:47 by dabrahams, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (6)
msg105870 - (view) Author: Dave Abrahams (dabrahams) Date: 2010-05-16 14:47
According to the RFC, the server is allowed to send back any encoding it likes when no Accept-Encoding header is supplied, but all the examples I can find of urllib2.urlopen usage assume they're getting plain text back. I think it would be better to inject an Accept-Encoding header when none is explicitly supplied so that nobody else trips over this issue. See http://support.github.com/discussions/site/1510
msg105937 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-05-17 20:30
HTTP Ref says that Server can send any encoding, if client does not specify Accept-Encoding header. But if 'identity' is one of the encoding that server recognizes (?), then it should send it as identity, which indicates untransformed content. I also see in the httplib that Accept-Encoding = 'identity' is added in the request level to the headers. I shall see what is missing here, if it is not being sent for all requests. BTW, I could not figure out the problem you are facing from the url mentioned. I specifically do not see any interleaving gzip and no-gzip request behaviours at different points.
msg105959 - (view) Author: Dave Abrahams (dabrahams) Date: 2010-05-18 10:02
How many tests did you run? My two tests were minutes apart. I have the feeling that this has something to do with cacheing behavior on the server.
msg183573 - (view) Author: karl (karlcow) * Date: 2013-03-06 02:32
What was the content of http://support.github.com/discussions/site/1510 I can't find it. Is the issue still going on?
msg239926 - (view) Author: Demian Brecht (demian.brecht) * (Python triager) Date: 2015-04-02 15:32
This doesn't seem to be an issue in 3.4+, the following headers are injected in a call to urlopen(): GET / HTTP/1.1 Accept-Encoding: identity Host: example.com User-Agent: Python-urllib/3.4 Connection: close However, this is not the same behaviour in 2.7: GET / HTTP/1.0 Host: example.com User-Agent: Python-urllib/1.17 That said, I wouldn't see this as a bug but a feature request, so it should be invalid for 2.7. Setting this to pending to close unless anyone has any objections or further details.
msg265526 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-05-14 12:46
I suspect for Demian’s 2.7 experiment, he used the older urllib.urlopen(), rather than urllib2.urlopen() as given in the original description. When I use urllib2.urlopen("http://localhost/"), I see GET / HTTP/1.1 Accept-Encoding: identity Host: localhost Connection: close User-Agent: Python-urllib/2.7 Even in the urllib (no 2) case, since it is using HTTP 1.0, I suspect not having Accept-Encoding is not such a problem. The underlying HTTP library has always added “Accept-Encoding: identity” for HTTP 1.1 by default (https://hg.python.org/cpython/annotate/4a3e9871b41b/Lib/httplib.py#l444), so I am closing this.
History
Date User Action Args
2022-04-11 14:57:01 admin set github: 52978
2016-05-14 12:46:10 martin.panter set status: pending -> closedtitle: Should urrllib2.urlopen send an Accept-Encoding header? -> Should urllib2.urlopen send an Accept-Encoding header?nosy: + martin.pantermessages: + resolution: works for me
2015-04-02 15:32:23 demian.brecht set status: open -> pendingnosy: + demian.brechtmessages: +
2013-03-06 02:32:12 karlcow set nosy: + karlcowmessages: +
2010-12-22 07:48:02 eric.araujo set nosy: + eric.araujoversions: - Python 2.6
2010-05-18 10:02:40 dabrahams set messages: +
2010-05-17 20:30:17 orsenthil set messages: +
2010-05-16 18:24:46 pitrou set assignee: orsenthiltype: behaviornosy: + orsenthilversions: + Python 3.1, Python 2.7, Python 3.2
2010-05-16 14:47:09 dabrahams create