Issue 1591035: update urlparse to RFC 3986 (original) (raw)

urlparse implements RFC 1808. That is strongly out of date. The most recent is RFC 3986.

Here is a text from 4Suite

# Reasons to avoid using urllib.basejoin() and

urlparse.urljoin(): # - Both are partial implementations of long-obsolete specs. # - Both accept relative URLs as the base, which no spec allows. # - urllib.basejoin() mishandles the '' and '..' references. # - If the base URL uses a non-hierarchical or relative path, # or if the URL scheme is unrecognized, the result is not # always as expected (partly due to issues in RFC 1808). # - If the authority component of a 'file' URI is empty, # the authority component is removed altogether. If it was # not present, an empty authority component is in the result. # - '.' and '..' segments are not always collapsed as well as they # should be (partly due to issues in RFC 1808). # - Effective Python 2.4, urllib.basejoin() is urlparse.urljoin(), # but urlparse.urljoin() is still based on RFC 1808.

See also the back python-dev discussions on "urlparse" for examples of people wanting a better/more up-to-date urlparse/urljoin.