Issue 8843: urllib2 Digest Authorization uri must match request URI (original) (raw)
When using Digest authentication to authenticate with a web server, according to rfc2617 (section 3.2.2.5) the uri in the Authorization header MUST match the request URI.
urllib2.AbstractDigestAuthHandler doesn't honour this when we request a url of the form 'http://hostname' without the trailing slash and we end up with request headers of the form:
GET / 1.1 ... Authorization: Digest ... uri="" <- should be uri="/"!
A web server will return 400 Bad Request error.
I attach a patch to fix urllib2.AbstractDigestAuthHandler.get_authorization that simply checks for the empty uri and uses '/' instead. It's the same thing that httplib.HTTPConnection does when it builds the GET line.
However I do wonder if this uri normalisation should be part of Request.get_selector?
Following is a script to demonstrate the behaviour, if you call it as:
./do_digest_request.py http://myserver username password
(and assuming myserver is using Digest authentication) there will a 400 response instead of it working.
--- do_digest_request.py #!/usr/bin/env python
import sys import urllib2 import urlparse
def request( url, username, password ):
p = urlparse.urlparse( url )
password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password( None, p.hostname, username, password )
handlers = [
urllib2.HTTPDigestAuthHandler( password_manager ),
]
opener = urllib2.build_opener( *handlers )
request = urllib2.Request( url )
response = opener.open( request )
response.read()
if name == 'main': request( sys.argv[1], sys.argv[2], sys.argv[3] )
FWIW, here's my take on this:
RFC 2617 (3.2.2.5) states: This may be "*", an "absoluteURL" or an "abs_path" as specified in section 5.1.2 of [2], but it MUST agree with the Request-URI.
Note: It must AGREE.
RFC 3986 (6.2.3) states: In general, a URI that uses the generic syntax for authority with an empty path should be normalized to a path of "/".
In my mind, this normalization should actually happen server-side, not client as the patch is suggesting.
Additionally, should the logic in the supplied patch be applied, it would be inconsistent with any other than an empty path:
http://example.com -> / http://example.com/foo -> /foo
I would close this as won't fix.
Side note: get_selector was deprecated in 3.3 and removed in 3.4 in favour of the Request.selector attribute.