msg111511 - (view) |
Author: Nick Welch (Nick.Welch) |
Date: 2010-07-24 22:58 |
While the netloc/path parts of URLs are scheme-specific, and urlparse can be forgiven for refusing to parse them for unknown schemes, the query and fragment parts are standardized, and should be parsed for unrecognized schemes. According to Wikipedia: ------------------ Internet standard STD 66 (also RFC 3986) defines the generic syntax to be used in all URI schemes. Every URI is defined as consisting of four parts, as follows: : [ ? ] [ # ] ------------------ http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax Here is a demonstration of what urlparse currently does: >>> urlparse.urlsplit('myscheme://netloc/path?a=b#frag') SplitResult(scheme='myscheme', netloc='', path='//netloc/path?a=b#frag', query='', fragment='') >>> urlparse.urlsplit('http://netloc/path?a=b#frag') SplitResult(scheme='http', netloc='netloc', path='/path', query='a=b', fragment='frag') |
|
|
msg161087 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2012-05-19 00:13 |
New changeset 79e6ff3d9afd by Senthil Kumaran in branch '2.7': Issue9374 - Generic parsing of query and fragment portion of urls for any scheme http://hg.python.org/cpython/rev/79e6ff3d9afd New changeset a9d43e21f7d8 by Senthil Kumaran in branch '3.2': Issue9374 - Generic parsing of query and fragment portion of urls for any scheme http://hg.python.org/cpython/rev/a9d43e21f7d8 New changeset 152c78b94e41 by Senthil Kumaran in branch 'default': Issue9374 - Generic parsing of query and fragment portion of urls for any scheme http://hg.python.org/cpython/rev/152c78b94e41 |
|
|
msg161088 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2012-05-19 00:16 |
Thanks for raising this issue, Nick. Yes, I verified in both RFC 3986 and 2396 and realized we can safely adopt a generic parsing system for query and fragment portions of the urls for any scheme. Since it was supported in earlier versions too, I felt it was good move to backport too. Fixed in all versions. Thanks! |
|
|
msg165546 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2012-07-15 20:06 |
Removing the module attributes causes third-party code to break. See one example here: http://lists.idyll.org/pipermail/testing-in-python/2012-July/005082.html |
|
|
msg165547 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2012-07-15 20:07 |
Better link: https://github.com/pypa/pip/issues/552 |
|
|
msg168899 - (view) |
Author: Matthias Klose (doko) *  |
Date: 2012-08-22 17:29 |
this breaks the following upstream builds: createrepo, linkchecker, gwibber, pegasus-wm there is no need to remove is_hierarchical on the branches. it's not used by urlparse at all. is it safe to just keep the uses_query and uses_fragment lists on the branches as well? raising to a release blocker, I consider this as a regression for the 2.7 and 3.2 release series. |
|
|
msg169039 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2012-08-24 16:12 |
Senthil, either the module globals should be re-added for compatibility, or the commits should be reverted, IMO. |
|
|
msg169040 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2012-08-24 16:17 |
New changeset a0b3cb52816e by Georg Brandl in branch '3.2': Closes #9374: add back now-unused module attributes; removing them is a backward compatibility issue, since they have a public-seeming name. http://hg.python.org/cpython/rev/a0b3cb52816e New changeset c93fbc2caba5 by Georg Brandl in branch 'default': Closes #9374: merge with 3.2 http://hg.python.org/cpython/rev/c93fbc2caba5 New changeset a43481210964 by Georg Brandl in branch '2.7': Closes #9374: add back now-unused module attributes; removing them is a backward compatibility issue, since they have a public-seeming name. http://hg.python.org/cpython/rev/a43481210964 |
|
|
msg169052 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2012-08-24 17:23 |
Oops. I had not seen Eric and Mattiahs comment to this issue, which pointed out to the problem. Sorry for not acting on this. Thanks Georg for adding those module attributes back. On Fri, Aug 24, 2012 at 9:17 AM, Roundup Robot <report@bugs.python.org> wrote: > > Roundup Robot added the comment: > > New changeset a0b3cb52816e by Georg Brandl in branch '3.2': > Closes #9374: add back now-unused module attributes; removing them is a backward compatibility issue, since they have a public-seeming name. > http://hg.python.org/cpython/rev/a0b3cb52816e > > New changeset c93fbc2caba5 by Georg Brandl in branch 'default': > Closes #9374: merge with 3.2 > http://hg.python.org/cpython/rev/c93fbc2caba5 > > New changeset a43481210964 by Georg Brandl in branch '2.7': > Closes #9374: add back now-unused module attributes; removing them is a backward compatibility issue, since they have a public-seeming name. > http://hg.python.org/cpython/rev/a43481210964 > > ---------- > resolution: remind -> fixed > status: open -> closed > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue9374> > _______________________________________ |
|
|
msg171448 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2012-09-28 12:28 |
After encountering an instance of people relying on fragment not being parsed for "irc://" URLs, with resulting breakage, I don't think we should change this in point releases. IOW, it's fine for 3.3.0, but not for 2.7.x or 3.2.x. It may be fixing a bug, but the bug is not obvious and the fix is not backward compatible. I therefore suggest to roll back the commits to 3.2 and 2.7. |
|
|
msg171452 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2012-09-28 12:47 |
If there is a list of known protocols that don't use the fragment, can't we include it in urlparse as we already do in Lib/urlparse.py:34? If #channel in irc://example.com/#channel should not be parsed as fragment, then this can be considered as a regression. This doesn't necessary mean that the whole change is a regression though. |
|
|
msg171465 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2012-09-28 13:40 |
People make up URL schemes all the time, irc:// is not a special case. This change will mean breakage for them, unwarranted. |
|
|
msg171469 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2012-09-28 14:05 |
One would hope that people making up URI schemes would follow the generic syntax (and thus irc would be an exception), but as the risk exists I agree we should not break code in bugfix releases. |
|
|
msg171557 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2012-09-29 07:27 |
New changeset 950320c70fb4 by Georg Brandl in branch 'default': Add a versionchanged note for #9374 changes. http://hg.python.org/cpython/rev/950320c70fb4 |
|
|
msg179270 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-01-07 16:46 |
> It may be fixing a bug, but the bug is not obvious and the fix is not > backward compatible. I therefore suggest to roll back the commits to > 3.2 and 2.7. Well, the bug is quite obvious to me :-) (just hit it here) The fix for those who want the old behaviour is obvious: just pass `allow_fragments=False` to urlparse(). OTOH, if you revert the fix, patching things manually is quite cumbersome. |
|
|
msg216244 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2014-04-14 22:48 |
Reviewed the issue and correct rollbacks and commits were applied. This ticket should be closed. Thanks! |
|
|