Issue 18119: urllib.FancyURLopener does not treat URL fragments correctly (original) (raw)

Created on 2013-06-02 10:22 by takahashi.shuhei, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
issue18119-code-only.patch karlcow,2014-10-07 07:45 review
Messages (7)
msg190480 - (view) Author: Shuhei Takahashi (takahashi.shuhei) Date: 2013-06-02 10:22
When urllib.FancyURLopener encounters 302 redirection to a URL with fragments, it sends wrong URL to servers. For example, if we run: urllib.urlopen('http://example.com/foo') and the server responds like following. HTTP/1.1 302 Found Location: /bar#test Then urllib tries next to fetch http://example.com/bar#test, instead of http://example.com/bar. Correctly, urllib should strip fragment part of URL before issuing requests.
msg228417 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-10-04 00:26
Slipped under the radar?
msg228660 - (view) Author: karl (karlcow) * Date: 2014-10-06 11:29
This is the correct behavior GET http://example.com/foo with a response containing 302 and Location: /bar#test must trigger http://example.com/bar#test Location is defined in http://tools.ietf.org/html/rfc7231#section-7.1.2 7.1.2. Location The "Location" header field is used in some responses to refer to a specific resource in relation to the response. The type of relationship is defined by the combination of request method and status code semantics. Location = URI-reference The field value consists of a single URI-reference. When it has the form of a relative reference ([RFC3986], Section 4.2), the final value is computed by resolving it against the effective request URI ([RFC3986], Section 5). A bit after in the spec. If the Location value provided in a 3xx (Redirection) response does not have a fragment component, a user agent MUST process the redirection as if the value inherits the fragment component of the URI reference used to generate the request target (i.e., the redirection inherits the original reference's fragment, if any). For example, a GET request generated for the URI reference "http://www.example.org/~tim" might result in a 303 (See Other) response containing the header field: Location: /People.html#tim which suggests that the user agent redirect to "http://www.example.org/People.html#tim" Likewise, a GET request generated for the URI reference "http://www.example.org/index.html#larry" might result in a 301 (Moved Permanently) response containing the header field: Location: http://www.example.net/index.html which suggests that the user agent redirect to "http://www.example.net/index.html#larry", preserving the original fragment identifier.
msg228720 - (view) Author: Shuhei Takahashi (takahashi.shuhei) Date: 2014-10-06 16:29
Hi karl, Of course it is correct that the user agent is redirected to http://example.com/bar#test when it got such response. However, it never means UA can send an HTTP request containing fragment part. In RFC7230 section 3.1.1, HTTP request line is defined as: request-line = method SP request-target SP HTTP-version CRLF where request-target is defined in section 5.3. In usual case, it is origin-form: origin-form = absolute-path [ "?" query ] which never includes fragment part. This means current urllib behavior violates HTTP/1.1 spec.
msg228744 - (view) Author: karl (karlcow) * Date: 2014-10-06 23:03
Takahashi-san, Ah sorry misunderstood which part your were talking about. I assume wrongly you were talking about navigation. Yes for the request which is sent to the server it should be http://tools.ietf.org/html/rfc7230#section-5.3.1 So refactoring your example. 1st request: GET /foo HTTP/1.1 Accept: text/html Host: example.com server response HTTP/1.1 302 Found Location: /bar#test second request must be GET /bar HTTP/1.1 Accept: text/html Host: example.com As for the navigation context is indeed part of the piece of code taking in charge the document after being parsed and not the one doing the HTTP request. (putting it here just that people understand) (to be tested) For server side receiving invalid Request-line http://tools.ietf.org/html/rfc7230#section-3.1.1 Recipients of an invalid request-line SHOULD respond with either a 400 (Bad Request) error or a 301 (Moved Permanently) redirect with the request-target properly encoded. A recipient SHOULD NOT attempt to autocorrect and then process the request without a redirect, since the invalid request-line might be deliberately crafted to bypass security filters along the request chain.
msg228748 - (view) Author: karl (karlcow) * Date: 2014-10-06 23:30
In class urlopen_HttpTests https://hg.python.org/cpython/file/4f314dedb84f/Lib/test/test_urllib.py#l191 there is a test for invalid redirects def test_invalid_redirect(self): https://hg.python.org/cpython/file/4f314dedb84f/Lib/test/test_urllib.py#l247 And one for fragments def test_url_fragment(self): https://hg.python.org/cpython/file/4f314dedb84f/Lib/test/test_urllib.py#l205 which refers to http://bugs.python.org/issue11703 code in https://hg.python.org/cpython/file/d5688a94a56c/Lib/urllib.py
msg228755 - (view) Author: karl (karlcow) * Date: 2014-10-07 07:45
OK I fixed the code. The issue is here https://hg.python.org/cpython/file/1e1c6e306eb4/Lib/urllib/request.py#l656 newurl = urlunparse(urlparts) Basically it reinjects the fragment in the new url. The fix is easy. if urlparts.fragment: urlparts = list(urlparts) urlparts[5] = "" newurl = urlunparse(urlparts) I was trying to make a test for it, but failed. Could someone help me for the test so I can complete the patch? Added the code patch only.
History
Date User Action Args
2022-04-11 14:57:46 admin set github: 62319
2019-04-26 17:49:04 BreamoreBoy set nosy: - BreamoreBoy
2014-10-07 07:45:19 karlcow set files: + issue18119-code-only.patchkeywords: + patchmessages: +
2014-10-06 23:30:55 karlcow set messages: +
2014-10-06 23:03:04 karlcow set messages: +
2014-10-06 16:29:00 takahashi.shuhei set messages: +
2014-10-06 11:29:02 karlcow set nosy: + karlcowmessages: +
2014-10-04 00:26:37 BreamoreBoy set nosy: + BreamoreBoymessages: +
2013-06-08 17:42:30 ezio.melotti set nosy: + orsenthil, ezio.melotti
2013-06-04 14:58:45 mher set nosy: + mher
2013-06-02 10:22:10 takahashi.shuhei create