Issue 1722348: urlparse.urlunparse forms file urls incorrectly (original) (raw)

Created on 2007-05-20 22:35 by eigenlambda, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (5)

msg32097 - (view)

Author: Thomas Folz-Donahue (eigenlambda)

Date: 2007-05-20 22:35

This is a conversation with the current Python interpreter.

import urlparse urlparse.urlparse(urlparse.urlunparse(urlparse.urlparse("file:////usr/bin/python"))) ('file', 'usr', '/bin/python', '', '', '')

As you can see, the results are incorrect. The problem is in the urlunsplit function:

def urlunsplit((scheme, netloc, url, query, fragment)): if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'): if url and url[:1] != '/': url = '/' + url url = '//' + (netloc or '') + url if scheme: url = scheme + ':' + url if query: url = url + '?' + query if fragment: url = url + '#' + fragment return url

RFC 1808 (see http://www.ietf.org/rfc/rfc1808.txt ) specifies that a URL shall have the following syntax: :///;?#

The problem with the current version of urlunsplit is that it tests if there are already two slashes before the 'url' section before outputting a URL. This is incorrect because (1) RFC 1808 clearly specifies at least three slashes between the end of the scheme portion and the beginning of the path portion and (2) this method will strip the first few slashes from an arbitrary path portion, which may require those slashes. Removing that url[:2] != '//' causes urlunsplit to behave correctly when dealing with urls like file:////usr/bin/python .

msg32098 - (view)

Author: Thomas Folz-Donahue (eigenlambda)

Date: 2007-05-20 23:12

Some other issues with the urlparse module. Several constant lists defined at the beginning of the module should be sets because they are only used for testing if certain strings are in them. Also, urlunsplit() uses the + operator way too much, creating strings that are immediately thrown away. IMO, the alternative is actually more readable. Attaching a patch (diff -u urlparse.py urlparse.py.new > urlparse.diff). File Added: urlparse.diff

msg32099 - (view)

Author: Senthil Kumaran (orsenthil) * (Python committer)

Date: 2007-05-23 21:27

Hi Thomas, Verified the Bug with Python 2.5 and verified the fix as well. Works fine.

urlparse(urlunparse(urlparse('file:////home/ors'))) ('file', '', '//home/ors', '', '', '') urlparse(urlunparse(urlparse('file://///home/ors'))) ('file', '', '///home/ors', '', '', '') urlparse(urlunparse(urlparse('file://////home/ors'))) ('file', '', '////home/ors', '', '', '') urlparse(urlunparse(urlparse(urlunparse(urlparse('file://////home/ors'))))) ('file', '', '////home/ors', '', '', '')

msg67937 - (view)

Author: Senthil Kumaran (orsenthil) * (Python committer)

Date: 2008-06-11 02:55

This issue no longer exists. I verified the bug report on the trunk and urlparse() and urlunparse methods behave properly on subsquent usages on the same url.

urlparse.urlparse(urlparse.urlunparse(urlparse.urlparse('file:///home/ors/Letter.txt'))) ParseResult(scheme='file', netloc='', path='/home/ors/Letter.txt', params='', query='', fragment='')

Can be closed as Invalid.

msg237833 - (view)

Author: Martin Panter (martin.panter) * (Python committer)

Date: 2015-03-11 01:46

I believe this was closed incorrectly. The original test case includes four slashes, not three, and still fails in 3.5. However Issue 23505 has been opened in the mean time which should cover this.

History

Date

User

Action

Args

2022-04-11 14:56:24

admin

set

github: 44982

2019-08-15 03:25:46

epicfaace

set

nosy: + epicfaace

2015-03-11 01:46:49

martin.panter

set

nosy: + martin.panter
messages: +
resolution: works for me -> duplicate

superseder: [CVE-2015-2104] Urlparse insufficient validation leads to open redirect

2008-06-11 06:02:39

georg.brandl

set

status: open -> closed
resolution: works for me

2008-06-11 02:55:50

orsenthil

set

messages: +

2007-05-20 22:35:20

eigenlambda

create