msg102569 - (view) |
Author: Michael Glassford (Michael Glassford) |
Date: 2010-04-07 21:42 |
An unfortunate side-effect of this change: http://svn.python.org/view/python/branches/release26-maint/Lib/urlparse.py?r1=66717&r2=78235 which was made to fix this issue: http://bugs.python.org/issue7904 is that urlparse.urlunparse(urlparse.urlparse('x://')) now returns 'x:' instead of 'x://', and urlparse.urlunparse(urlparse.urlparse('x:///y')) now returns 'x:/y' instead of 'x:///y'. This behavior exists in at least Python 2.6 and 3.1, but not in 2.5. |
|
|
msg102580 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-04-08 00:19 |
To fix this, urlparse would have to differentiate between a null netloc and no netloc characters specified at all. This could be done by using None for one an '' for the other. I'm not sure that behavior change could be backported to 2.6, though. Did this issue actually cause a program failure for you in 2.6? If so the original patch might wind up getting classed as a regression for the 2.6 line. |
|
|
msg102589 - (view) |
Author: Michael Glassford (Michael Glassford) |
Date: 2010-04-08 02:35 |
It caused a minor issue with the Schemes extension for Mercurial: the output changed, which caused a unit test to fail. I'm pretty sure I have a Mercurial patch to fix that issue, however. |
|
|
msg102737 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2010-04-09 18:00 |
Hello Michael, Looking a bit deeper into this issue, I don't see that 'x://' and 'x:///y' qualifies as valid URLS as per RFC 3986. (Well, urlparse has been not strictly conforming to it, but that is a different issue) If you look at the section 3. it states the following for validity. hier-part = "//" authority path-abempty / path-absolute / path-rootless / path-empty In those cases, I would assume that 'x://y', x:/y','x:/','/' as valid URLS, but not the two examples you mentioned. For the , we had just gone by the definition of RFC to make that minor change and it has resulted in this issue. I looked at the code to see if this can be addressed, but I see that your examples did not fit in as valid urls. Do you have any opinions on this? We can just the test_urlparse.py a little like below, and you might fix the break your code. def test_unparse_parse(self): - for u in ['Python', './Python','x-newscheme://foo.com/stuff']: + for u in ['Python', './Python','x-newscheme://foo.com/stuff','x://y','x:/y','x:/','/',]: self.assertEqual(urlparse.urlunsplit(urlparse.urlsplit(u)), u) self.assertEqual(urlparse.urlunparse(urlparse.urlparse(u)), u) |
|
|
msg102922 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2010-04-12 07:08 |
Added additional examples of valid urls in r79988 and branches under the roundtrip test cases. Michael, RDM: If you have any comments on , let me know. Otherwise we can close this issue as wont-fix. |
|
|
msg102932 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2010-04-12 09:28 |
By the way, I’ve been meaning to file a bug against the Mercurial schemes extension since I saw how it misused URIs (i.e. “py://trunk/” should be “py:trunk/”); I hope it’s not too late now that this extension is shipped with the program. Regards |
|
|
msg102939 - (view) |
Author: Michael Glassford (Michael Glassford) |
Date: 2010-04-12 11:50 |
> In those cases, I would assume that 'x://y', x:/y','x:/','/' > as valid URLS, but not the two examples you mentioned. Only 2 comments about this: 1) Although the urlparse documentation does mention the relevant RFCs, on a quick read-through I don't see that it actually requires its input to be a valid URL. 2) Obviously, some code is using it for invalid URLs. > Michael, RDM: If you have any comments on , > let me know. Otherwise we can close this issue as wont-fix. I don't have a problem with this. I reported the issue to bring attention to the fact that the change affected real code and to see if anyone had suggestions for dealing with the problem in a better way (at the moment, I don't). > By the way, I’ve been meaning to file a bug against the > Mercurial schemes extension... Please do. I don't have anything to do with the Schemes extension except that I found this issue while investigating a unit test failure, but if you file an issue probably I will look into it--or if I don't, then someone else probably will. |
|
|
msg102940 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2010-04-12 12:25 |
On Mon, Apr 12, 2010 at 11:50:57AM +0000, Michael Glassford wrote: > 1) Although the urlparse documentation does mention the relevant RFCs, on a quick read-through I don't see that it actually requires its input to be a valid URL. > 2) Obviously, some code is using it for invalid URLs. Which is true, because in some cases like urlparse.urljoin(base, rel), the rel is a relative url it undergoes the same parsing mechanism applicable to any url and thus there is no way a 'rigorous' check for authorized url is happening. Checks are done for valid chars in scheme and other parsing behaviours. > I don't have a problem with this. I reported the issue to bring attention to the fact that the change affected real code and to see if anyone had suggestions for dealing with the problem in a better way (at the moment, I don't). > Yeah, I understand the situation with 'changes which break the existing code', which we try to avoid many times. I shall if this can be addressed without reverting any recent changes and still being complaint. |
|
|
msg103141 - (view) |
Author: Senthil Kumaran (orsenthil) *  |
Date: 2010-04-14 19:09 |
It looks like any change for this specific case would make tests like handling anyscheme fail. I am less inclined to do that as opposed to handling this case for invalid url, I suggest this be handled at user end to fix the proper parsing and constructing correct url. I am closing this as wont-fix. |
|
|