[Python-Dev] urlparse.urlunsplit should be smarter about + (original) (raw)

Senthil Kumaran orsenthil at gmail.com
Tue May 11 06:55:30 CEST 2010


On Mon, May 10, 2010 at 05:56:29PM +0900, Stephen J. Turnbull wrote:

Senthil Kumaran writes:

> I should have said, 'treatment of urls with authority' and 'treatment > of urls without authority' in terms of parsing and joining is as per > RFC. How it is doing practically is by maintaining a list of urls > with known scheme names which usenetloc. Why do that if you can get better behavior based purely on syntactic analysis?

For the cases for just parsing and splitting, the syntactic behaviours are fine enough. I agree with your comments and reinstatement of RFC rules in the previous emails.

The problem as we know off, comes while unparsing and joining, ( also I have not yet looked at the relative url joining behaviour where redundant /'s can be ignored).

As you may already know, when the data is

ParseResult(scheme='file', netloc='', path='/tmp/junk.txt', params='', query='', fragment='')

You might expect the output to be file:///tmp/junk.txt Original might be same too.

But for: ParseResult(scheme='x', netloc='', path='/y', params='', query='', fragment='')

One can expect a valid output to be: x:/y

Your suggestion of netloc/authority being differentiate by '' and None seems a good one to analyze.

Also, by keeping a registry of valid schemes, are you not proposing something very similar to uses_netloc? But with a different API to handle parsing based on registry values. Is my understanding of your proposal correct?

FWIW, I looked at the history of uses_netloc list and it seems that it been there from the first version when urlparse module followed different rfc specs for different protocols (telnet, sip etc), so any changes should be carefully incorporated as not to break the existing solutions.

-- Senthil



More information about the Python-Dev mailing list