[Python-Dev] Some more comments re new uriparse module, patch 1462525 (original) (raw)

Mike Brown mike at skew.org
Fri Jun 9 06:54:18 CEST 2006


John J Lee wrote:

> http://python.org/sf/1500504 [...]

At first glance, looks good. I hope to review it properly later. One point: I don't think there should be any mention of "URL" in the module -- we should use "URI" everywhere (see my comments on Paul's original version for a bit more on this).

Agreed.

Although you've added the test cases from 4Suite and credited me for them, only a few of the test cases were invented by me. I'd rather you credited them to their original sources, as I did.

Also, I believe Graham Klyne has been adding some new cases to his Haskell tools, but hasn't been updating the other spreadsheet and RDF files in which he publishes them in a more usable form. My tests only use what's in the spreadsheet, so I've only got 88 out of 99 "testRelative" cases from http://cvs.haskell.org/cgi-bin/cvsweb.cgi/fptools/libraries/network/tests/URITest.hs So if you really want to be thorough, grab the missing cases from there.

It appears that Paul uploaded a new version of his library on June 3: http://python.org/sf/1462525 I'm unclear on the relationship between the two now. Are they both up for consideration?

One thing I forgot to mention in private email is that I'm concerned that the inclusion of URI reference resolution functionality has exceeded the scope of this 'urischemes' module that purports to be for 'extensible URI parsing'. It is becoming a scheme-aware and general-purpose syntactic processing library for URIs, and should be characterized as such in its name as well as in its documentation.

Even without a new name and more accurately documented scope, people are going to see no reason not to add the rest of STD 66's functionality to it (percent-encoding, normalization for testing equivalence, syntax validation...). As you can see in Ft.Lib.Uri, the latter two are not at all hard to implement, especially if you use regular expressions. These all fall under syntactic operations on URIs, just like reference-resolution.

Percent-encoding gets very hairy with its API details due to application-level uses that don't jive with STD 66 (e.g. the fuzzy specs and convoluted history governing application/x-www-form-urlencoded), the nuances of character encoding and Python string types, and widely varying expectations of users.



More information about the Python-Dev mailing list