[Python-Dev] Possible bug in urllib.urljoin (original) (raw)
Andrew Edmondson a.edmondson at eris.qinetiq.com
Fri Sep 23 09:35:06 CEST 2005
- Previous message: [Python-Dev] Weekly Python Patch/Bug Summary
- Next message: [Python-Dev] Possible bug in urllib.urljoin
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dear all,
We've found a problem using urllib.urljoin when upgrading from python 2.3 to 2.4. It no longer joins a particular corner case of URLs correctly (we think!).
The code appears to follow the algorithm (from http://www.ietf.org/rfc/rfc1808.txt) for resolving urls almost exacty...
I believe the problem occurs when reaching "step 5" (approx line 160) which will happen if the embedded url has no scheme, netloc or path (and is nonempty).
Following the algorithm the resulting url should now be returned using the base urls scheme,netloc and path but the embedded urls params / query (if present else set to base ones) which follows in 2.3:
if not path:
if not params:
params = bparams
if not query:
query = bquery
return urlunparse((scheme, netloc, bpath,
params, query, fragment))
However in 2.4, even if the embedded urls path is empty, unless the params and query segments are empty too, flow passes to step 6.
if not (path or params or query):
return urlunparse((scheme, netloc, bpath,
bparams, bquery, fragment))
and thus the last segment of the base path will be removed in order to append the embedded url's path, but the path is empty! and so the resulting path is returned incorrectly.
Can you tell me if this was a deliberate decision to move from following the algorithm? If so then we'll work around it.
############################################################################## Andrew Edmondson PGP Key: http://search.keyserver.net:11371/pks/lookup?op=get&search=0xCEE814DC PGP Fingerprint: 7B32 4D1E AC4F 29E2 9EAA 9550 1A3D BBA4 CEE8 14DC
- Previous message: [Python-Dev] Weekly Python Patch/Bug Summary
- Next message: [Python-Dev] Possible bug in urllib.urljoin
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]