[Python-Dev] Fwd: RFC - GoogleSOC proposal -cleanupurllib (original) (raw)

Mike Brown mike at skew.org
Sat Mar 24 22:48:04 CET 2007

Previous message: [Python-Dev] Fwd: RFC - GoogleSOC proposal -cleanupurllib
Next message: [Python-Dev] Fwd: RFC - GoogleSOC proposal -cleanupurllib
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Senthil Kumaran wrote:

I have written a proposal to cleanup urllib as part of Google SoC. I am attaching the file 'soc1' with this email. Requesting you to go through the proposal and provide any feedback which I can incorporate in my submission.

From your proposal:

2) In all modules, Follow the new RFC 2396 in favour of RFC 1738 and RFC 1808. [...] In all modules, follow the new RFC 2396 in favor of RFC 1738, RFC 1808. The standards for URI described in RFC 2396 is different from older RFCs and urllib, urllib2 modules implement the URL specifications based on the older URL specification. This will need changes in urlparse and other parse modules to handle URLS as specified in the RFC2396.

The "new" RFC 2396 was superseded by STD 66 (RFC 3986) two years ago. Your failure to notice this development doesn't bode well :) j/k, although it does undermine confidence somewhat.

I think the bugfixes sound great, but major enhancements and API refactorings need to be undertaken more cautiously.

In any case, I have a few suggestions:

Read http://en.wikipedia.org/wiki/Uniform_Resource_Identifier. (I wrote the majority of it, and got peer review from the URI WG a while back).
Read http://en.wikipedia.org/wiki/Percent_encoding. (I wrote most of this too).
Familiarize yourself with STD 66. (i.e., don't trust anything I wrote ;)) Especially note its differences from RFC 2396 (summarized in an appendix).
Seek peer review for any changes that you attribute to changing standards.

In my experience implementing a general-purpose URI processing library (http://cvs.4suite.org/viewcvs/4Suite/Ft/Lib/Uri.py?view=markup ), there were times when I thought the standard was saying a bit more than it really was, especially when it came to percent-encoding, which has several somewhat-conflicting conventions and standards governing it. I tried to cover these in the Wikipedia article.

Anticipate real-world use cases. If you go down the road of doing what the standards recommend (be aware of "should" vs "must" and whether it's directed at URI producers or consumers), you might lose sight of the fact that there's a reason, for example, people use encodings other than the recommended UTF-8 as the basis for percent-encoding. Similarly, expectations surrounding the behavior of 'file' URIs and path-portions thereof are sometimes less than optimal in the real world. If you're designing an API, be flexible, and seek review for any compatibilities you intend to introduce.
Be aware of the fact that people might have different expectations when they use different string types (unicode, str) in URI processing, and different levels of awareness of the levels of abstraction at which URI processing operates. It can be difficult to uniformly handle unicode and str. And then there's IRIs (RFC 3987)...

For additional background, you might also check the python-dev discussion of urllib in Sep 2004, urlparse in Nov 2005, and the competing uriparse.py proposals (Apr, Jun 2006).

Mike

Previous message: [Python-Dev] Fwd: RFC - GoogleSOC proposal -cleanupurllib
Next message: [Python-Dev] Fwd: RFC - GoogleSOC proposal -cleanupurllib
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list