Issue 1333: merge urllib and urlparse functionality (original) (raw)

Created on 2007-10-26 12:16 by techtonik, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (7)
msg56787 - (view)	Author: anatoly techtonik (techtonik)	Date: 2007-10-26 12:16
The purpose is to encapsulate all URL handling functions in one module. At the moment there are three modules that dissect URLs for various bits of information. These are urlparse - to split url into components, urllib - to decode splitted data and cgi - to parse query component. To outline the API of the proposed module I'll start with urlparse : http://docs.python.org/lib/module-urlparse.html 1. There are two identical functions - urlparse and urlsplit that make the same parsing operation, but vary in format of return arguments. They could be replaced with one - let's call it urlsplitex - that returns result in a class with attributes - not a subclass of list, but rather dictionary subclass, because positional arguments are evil and you always have to look into reference to find out the correct order if you read or debug the code. 2. Returned class should not be immutable. It must be possible to modify the results to unset extra parts (like fragments) or edit required parts as needed and get the target URL via urlunsplitex or embedded method of the same class. Thus arguments "default_scheme" and "allow_fragments" will be useless as well as function urldefrag. 3. urlparsex, a replacement for "parsing" function of the new library should be high-level functions to dissect url information into tree-like structure with atomic leafs. This includes decoding url entities and splitting parameters into child structures. The proposed structure of url class attributes is: scheme string netloc class username string password string server string port integer path list with objects of class part string param list with objects of class name string value string query list with objects of class name string value string fragment string 4. urlunparsex will be provided to reassemble class back into URL. This will deprecate series of functions from urllib like quote, unquote, urlencode and also functions parse_qs and parse_qsl from cgi module. References: http://mail.python.org/pipermail/patches/2005-February/016972.html http://bugs.python.org/issue1722348 http://bugs.python.org/issue1462525
msg56804 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-26 17:55
You missed urllib2 I think. :-) I agree it's a mess. I'm sure it all started out with backwards compatibility in mind. I find myself often importing cgi only to use the tiny function escape() that is defined there... I wonder if web-sig wouldn't be a good place to get some kindred spirits together to redesign these APIs for Py3k?
msg56954 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2007-10-30 03:55
I have started this work at http://svn.python.org/projects/sandbox/trunk/urilib/ as a part of G-SoC, yes taking it to web-sig would be appropriate and I shall do so. techtonik, you might want to review it urilib and we can discuss it further. Thanks,
msg58600 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-12-14 00:31
Please contact Brett Cannon. He organized the stdlib cleanup.
msg58601 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2007-12-14 00:38
Yes, the modules should probably all get merged somehow. But discussing it in the web-sig is fine with me and I am happy to look at what the web-sig ends up recommending.
msg67140 - (view)	Author: Facundo Batista (facundobatista) *	Date: 2008-05-21 00:05
Brett, in consideration of PEP 3108... shouldn't we close this issue? The urilib module in the sandbox wasn't updated in the last seven months. Or we just keep this open as a reminder? (of what?) Thanks!
msg67164 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2008-05-21 17:56
While the work is appreciated, PEP 3108 is taking this in a different direction.

History
Date	User	Action	Args
2022-04-11 14:56:27	admin	set	github: 45674
2008-05-21 17:56:02	brett.cannon	set	status: open -> closedresolution: out of datemessages: +
2008-05-21 00:05:27	facundobatista	set	nosy: + facundobatistamessages: +
2008-01-06 22:29:45	admin	set	keywords: - py3kversions: Python 3.0
2007-12-14 00:38:33	brett.cannon	set	messages: +
2007-12-14 00:31:55	christian.heimes	set	assignee: brett.cannonmessages: + nosy: + brett.cannon, christian.heimes
2007-12-13 18:26:16	alexandre.vassalotti	set	resolution: accepted -> (no value)
2007-11-08 14:54:39	christian.heimes	set	priority: normalkeywords: + py3kresolution: accepted
2007-10-30 03:55:36	orsenthil	set	nosy: + orsenthilmessages: +
2007-10-26 17:55:29	gvanrossum	set	nosy: + gvanrossummessages: +
2007-10-26 12:16:28	techtonik	create