[Python-Dev] urlparse brokenness (original) (raw)
Paul Jimenez pj at place.org
Wed Nov 23 06:04:55 CET 2005
- Previous message: [Python-Dev] a Python interface for the AST (WAS: DRAFT: python-dev...)
- Next message: [Python-Dev] urlparse brokenness
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
It is my assertion that urlparse is currently broken. Specifically, I think that urlparse breaks an abstraction boundary with ill effect.
In writing a mailclient, I wished to allow my users to specify their imap server as a url, such as 'imap://user:password@host:port/'. Which worked fine. I then thought that the natural extension to support configuration of imapssl would be 'imaps://user:password@host:port/'.... which failed - user:passwrod at host:port got parsed as the path of the URL instead of the network location. It turns out that urlparse keeps a table of url schemes that 'use netloc'... that is to say, that have a 'user:password at host:port' part to their URL. I think this 'special knowledge' about particular schemes 1) breaks an abstraction boundary by having a function whose charter is to pull apart a particularly-formatted string behave differently based on the meaning of the string instead of the structure of it and 2) fails to be extensible or forward compatible due to hardcoded 'magic' strings - if schemes were somehow 'registerable' as 'netloc using' or not, then this objection might be nullified, but the previous objection would still stand.
So I propose that urlsplit, the main offender, be replaced with something that looks like:
def urlsplit(url, scheme='', allow_fragments=1, default=('','','','','')):
"""Parse a URL into 5 components:
:///
if "://" in url:
uscheme, npqf = url.split("://", 1)
else:
uscheme = scheme
if not uscheme:
uscheme = default[0]
npqf = url
pathidx = npqf.find('/')
if pathidx == -1: # not found
netloc = npqf
path, query, fragment = default[1:4]
else:
netloc = npqf[:pathidx]
pqf = npqf[pathidx:]
if '?' in pqf:
path, qf = pqf.split('?',1)
else:
path, qf = pqf, ''.join(default[3:5])
if ('#' in qf) and allow_fragments:
query, fragment = qf.split('#',1)
else:
query, fragment = default[3:5]
tuple = (uscheme, netloc, path, query, fragment)
_parse_cache[key] = tuple
return tuple
Note that I'm not sold on the _parse_cache, but I'm assuming it was there for a reason so I'm leaving that functionality as-is.
If this isn't the right forum for this discussion, or the right place to submit code, please let me know. Also, please cc: me directly on responses as I'm not subscribed to the firehose that is python-dev.
--pj
- Previous message: [Python-Dev] a Python interface for the AST (WAS: DRAFT: python-dev...)
- Next message: [Python-Dev] urlparse brokenness
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]