Issue 8843: urllib2 Digest Authorization uri must match request URI (original) (raw)

When using Digest authentication to authenticate with a web server, according to rfc2617 (section 3.2.2.5) the uri in the Authorization header MUST match the request URI.

urllib2.AbstractDigestAuthHandler doesn't honour this when we request a url of the form 'http://hostname' without the trailing slash and we end up with request headers of the form:

GET / 1.1 ... Authorization: Digest ... uri="" <- should be uri="/"!

A web server will return 400 Bad Request error.

I attach a patch to fix urllib2.AbstractDigestAuthHandler.get_authorization that simply checks for the empty uri and uses '/' instead. It's the same thing that httplib.HTTPConnection does when it builds the GET line.

However I do wonder if this uri normalisation should be part of Request.get_selector?

Following is a script to demonstrate the behaviour, if you call it as:

./do_digest_request.py http://myserver username password

(and assuming myserver is using Digest authentication) there will a 400 response instead of it working.

--- do_digest_request.py #!/usr/bin/env python

import sys import urllib2 import urlparse

def request( url, username, password ):

p = urlparse.urlparse( url )
password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password( None, p.hostname, username, password )

handlers = [
    urllib2.HTTPDigestAuthHandler( password_manager ),
]

opener = urllib2.build_opener( *handlers )
request = urllib2.Request( url )
response = opener.open( request )
response.read()

if name == 'main': request( sys.argv[1], sys.argv[2], sys.argv[3] )

FWIW, here's my take on this:

RFC 2617 (3.2.2.5) states: This may be "*", an "absoluteURL" or an "abs_path" as specified in section 5.1.2 of [2], but it MUST agree with the Request-URI.

Note: It must AGREE.

RFC 3986 (6.2.3) states: In general, a URI that uses the generic syntax for authority with an empty path should be normalized to a path of "/".

In my mind, this normalization should actually happen server-side, not client as the patch is suggesting.

Additionally, should the logic in the supplied patch be applied, it would be inconsistent with any other than an empty path:

http://example.com -> / http://example.com/foo -> /foo

I would close this as won't fix.

Side note: get_selector was deprecated in 3.3 and removed in 3.4 in favour of the Request.selector attribute.