Issue 549151: urllib2 POSTs on redirect (original) (raw)

Created on 2002-04-26 16:04 by jjlee, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Messages (41)

msg10552 - (view)

Author: John J Lee (jjlee)

Date: 2002-04-26 16:04

urllib2 (I'm using 1.13.22 with Python 2.0, but I assume the 2.2 branch does the same) uses the POST method on redirect, contrary to RFC1945 section 9.3:

9.3 Redirection 3xx

This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required may be carried out by the user agent without interaction with the user if and only if the method used in the subsequent request is GET or HEAD. A user agent should never automatically redirect a request more than 5 times, since such redirections usually indicate an infinite loop.

Can be fixed in HTTPRedirectHandler.http_error_302 by replacing new = Request(newurl, req.get_data())

with new = Request(newurl)

so that GET is done on redirect instead of POST.

I suppose the limit of 10 in the same function should be changed to 5, also.

msg10553 - (view)

Author: John J Lee (jjlee)

Date: 2002-04-28 16:21

Logged In: YES user_id=261020

  1. Bug is also in 2.2 branch

  2. The fix (in 2.1 and 2.2) should reflect the earlier bug fix in the 2.2 branch to add the old headers:

new = Request(newurl, headers=req.headers)

  1. I guess 10 should be replaced with 4, not 5.

John

msg10554 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2002-04-29 00:53

Logged In: YES user_id=6380

Hm, the way I interpret the text you quote, if the original request is a POST, it should probably not substitute a GET but report the error.

Assigning to Jeremy since it's his module.

msg10555 - (view)

Author: John J Lee (jjlee)

Date: 2002-04-29 16:29

Logged In: YES user_id=261020

I don't see why it shouldn't substitue a GET. That certainly seems to be the standard practice (well, at least that's what lynx does), and in the case of the only site where I've encountered redirects on POST, the redirect URL contains urlencoded stuff, so it clearly expects the user-agent to do a GET. The site would break if this didn't happen, so I guess Netscape and IE must do the same thing.

Clearly the RFC does allow this, though it doesn't require it (or specify what should happen here in any way other than to say that a POST is not allowed, in fact). Since it's standard practice and allowed by the RFC, I don't think it should be an error.

John

msg10556 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2002-04-29 18:10

Logged In: YES user_id=6380

Fair enough. I'll leve it to Jeremy to review the proposed fix. (Note that he's busy though.)

msg10557 - (view)

Author: Jeremy Hylton (jhylton) (Python triager)

Date: 2002-06-06 15:09

Logged In: YES user_id=31392

I think you need to review the current HTTP spec -- RFC 2616 -- and look at the section on redirection (10.3). I think urllib2 could improve its handling of redirection, but the behavior you describe from lynx sounds incorrect. I'd be happy to review a patch the implemented the current spec.

Also, RFC 2616 removes the recommendation of a 5 redirect limit.

msg10558 - (view)

Author: John J Lee (jjlee)

Date: 2002-06-06 21:25

Logged In: YES user_id=261020

Oh, I hadn't realised httplib uses HTTP/1.1 now -- and now I check, I see that even RFC 1945 (HTTP/1.0) has a whole set of restrictions on 30x for particular values of x which I missed completely.

I guess it should do what Guido suggested -- report an error if a POST is redirected -- until somebody gets time to do it properly. I won't have time for a couple of months, but if nobody has done it by then I'll upload a patch.

John

msg10559 - (view)

Author: John J Lee (jjlee)

Date: 2002-06-19 21:56

Logged In: YES user_id=261020

I've attached a patch -- was simpler than I thought.

I think this is in compliance with RFC 2616. It's also simple for clients to inherit from HTTPRedirectHandler and override redirect_request if, for example, they know they want all 302 POSTs to be redirected as a GET, which removes the need for mixing redirection code with normal client code even when the server doesn't follow the RFC (if the server expects a 302 POST to be redirected as a GET, for example). Note that I had to add a method, named 'method', to the Request class, for maintainability (it's possible that urllib2 may be able to do methods other than GET and POST in future).

Possibly redirect_request should take fewer parameters.

John

msg10560 - (view)

Author: John J Lee (jjlee)

Date: 2002-06-19 21:57

Logged In: YES user_id=261020

I've attached a patch -- was simpler than I thought.

I think this is in compliance with RFC 2616. It's also simple for clients to inherit from HTTPRedirectHandler and override redirect_request if, for example, they know they want all 302 POSTs to be redirected as a GET, which removes the need for mixing redirection code with normal client code even when the server doesn't follow the RFC (if the server expects a 302 POST to be redirected as a GET, for example). Note that I had to add a method, named 'method', to the Request class, for maintainability (it's possible that urllib2 may be able to do methods other than GET and POST in future).

Possibly redirect_request should take fewer parameters.

John

msg10561 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2002-06-20 00:57

Logged In: YES user_id=6380

Hm, I thought that 307 should be treated differently than 303. In particular, a 307 response to POST should cause the POST to be redirected. And I'm not sure about what a 301 response to POST should do.

(In the mean time I have been convinced that changing POST to GET on 302 is the best thing to do given that most browsers do that. The old urllib.py should be fixed too.)

msg10562 - (view)

Author: John J Lee (jjlee)

Date: 2002-06-23 19:23

Logged In: YES user_id=261020

First, can I underline the fact that there are two issues here:

  1. What method should POST be redirected as for a particular 30x response?

  2. Should the user be asked to confirm the redirection (which presumably maps onto raising an HTTPError in urllib2)?

I think current client implementations go against the RFCs on both these points for 302. Contrastingly, RFC 2616's note on 301 responses (section 10.3.2) implies that only a subset of HTTP/1.0 (not 1.1, note) clients violate the RFCs (1945, 2068, 2616) on point 1 for 301 responses, and I know of no clients that violate the RFCs on point 2 for 301 responses (but I haven't tested). Also, I guess the original intent (for both 301 and 302) was that POSTs represent an action, so it's dangerous in general to redirect them without asking the user: I presume this is why 307 was brought in: 307 POSTs on redirected POST, so the user must be asked:

10.3 Redirection 3xx [...] The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD.

(and 303 is just meant to have the 'erroneous' 302 behaviour: GET on redirected POST, don't ask the user)

So, I agree with Guido that 302 should probably now be treated the way most clients treat it: GET on redirected POST, don't ask the user. As for 301, if it is true that only a few HTTP/1.0 clients have misinterpreted the RFCs on 301, we should stick to the safe behaviour indicated by the RFC.

As for Guido's first question: A 307 should indeed be redirected as a POST, but it should ask the user first (see 10.3 quote, above). That's what my patch does (the 'return None' in the else clause in redirect_request causes an exception to be raised eventually -- maybe it should just raise it itself).

I've attached a revised patch that deals with 302 (and 303) in the 'erroneous' way (again: GET on redirected POST, don't ask the user).

I suppose the comment about HTTPRedirectHandler also needs to state that 301 and 307 POST requests are not automatically redirected -- HTTPError is raised in that case.

John

msg10563 - (view)

Author: John J Lee (jjlee)

Date: 2002-10-05 17:44

Logged In: YES user_id=261020

Here is a patch for the minor documentation changes required for the patch I uploaded earlier to fix this bug. I've also uploaded a slightly revised patch:

I renamed the method' method to get_method' for consistency with the other methods of the Request class. The reason this new method is there at all is because the logic that decides whether or not a redirection should take place is defined in terms of the method type (GET/HEAD/POST). urllib2 doesn't support HEAD ATM, but it might in future. OTOH, the Request class is supposed to be generic -- not specific to HTTP -- so perhaps get_method isn't really appropriate. OTOOH, add_data is already documented to only be meaningful for HTTP Requests. I suppose there could be a subclass HTTPRequest, but that's less simple. What do you think is best?

I also changed redirect_request to raise an exception instead of returning None (because no other Handler should handle the redirect), and changed its documented return value / exception interface to be consistent with the rest of urllib2.

John

msg10564 - (view)

Author: John J Lee (jjlee)

Date: 2002-10-08 16:53

Logged In: YES user_id=261020

Guido mentioned that urllib.py should be fixed too. I've attached another patch.

John

msg10565 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2002-10-08 18:23

Logged In: YES user_id=6380

Um, I don't see the patch for urllib.py. Perhaps you didn't check the box?

Jeremy, let's do this in 2.3, OK?

msg10566 - (view)

Author: John J Lee (jjlee)

Date: 2002-10-09 21:16

Logged In: YES user_id=261020

Ah. Here it is.

John

msg10567 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2002-10-10 01:27

Logged In: YES user_id=6380

OK, over to jeremy.

msg10568 - (view)

Author: Jeremy Hylton (jhylton) (Python triager)

Date: 2002-10-11 15:40

Logged In: YES user_id=31392

I'm still uncertain about what to do on 301 responses. As you say, there are two issues to sort out: 1) what method should a POST be redicted to and 2) whether the user should be asked to confirm.

There's discussion of this at: http://ppewww.ph.gla.ac.uk/~flavell/www/post-redirect.html The page is a bit old, but I don't know if it is out of date. It says that IE5, Netscape 4, and Mozilla 1.0 all redirect a 301 POST to a GET without user confirmation. That seems like a widespread disregard for the spec that we ought to emulate.

I agree with your interpretation that urllib2 raise an HTTPError to signal "request confirm" because an HTTPError is also a valid response that the user could interpret. But of urllib, an HTTP error doesn't contain a valid response. The change would make it impossible for the client to do anything if a 301 response is returned from a POST. That seems worse than doing the wrong.

msg10569 - (view)

Author: John J Lee (jjlee)

Date: 2002-10-15 15:22

Logged In: YES user_id=261020

Well, when I carefully read the RFC I came to the conclusion that 301 should be redirected to GET, not POST. I also came to the conclusion that the RFC was slightly ambiguous on this point, so I guess that's why A. Flavell came to the conclusion that 301 should redirect to POST rather than GET. Anyway, clearly the established practice is to redirect 301 as GET, so this is all academic.

Assuming he's right about Netscape / IE etc., I suppose I'm happy for 301 to redirect without an exception, since that's what we've agreed to do for 302. Obviously, the docs should say this is contrary to the RFC (as my doc patch says for 302 already).

As for urllib.py, I see the problem. 303 should still be added, though, since that poses no problem at all.

John

msg10570 - (view)

Author: John J Lee (jjlee)

Date: 2003-03-05 21:03

Logged In: YES user_id=261020

The fix for this was set to go into 2.3, but nothing has
happened yet.

There were a couple of minor details to be fixed in the patches, so I've done that, and uploaded new patches for
urllib2.py, urllib.py, and their documentation files (against
current CVS). Well, I'm just about to upload them...

Jeremy, can you apply the patches and finally close this one now? Let me know if anything else needs doing.

This should actually close another bug on SF, too (if it's still open), which deals with the urllib.py case. I can't find it,
though. I did attach a comment to it (before I lost it) that references this bug.

msg10571 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2003-03-07 20:57

Logged In: YES user_id=6380

I'll try to work on this.

msg10572 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2003-04-16 20:27

Logged In: YES user_id=80475

Do you have time to take this one back? My head starts to explode when researching whether this is the right thing to do.

msg10573 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2003-04-16 20:35

Logged In: YES user_id=6380

I think this is the right thing to do. Can you check it in? (There are several files -- urllib, urllib2, and its docs).

msg10574 - (view)

Author: John J Lee (jjlee)

Date: 2003-04-20 20:44

Logged In: YES user_id=261020

Just noticed that I forgot to upload one of the doc patches on 2003-03-05. Here it is.

The patches that need to be applied are the three I uploaded on 2003-03-05 (urllib.py.patch, urllib2.py.patch and liburllib2.tex.patch), and the liburllib.tex.patch I just uploaded.

msg10575 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2003-04-24 15:34

Logged In: YES user_id=80475

Committed as:

Lib/urllib.py 1.157 Lib/urllib2.py 1.39 Doc/lib/liburllib.tex 1.47 Doc/lib/liburllib2.tex 1.7

Guido, would you like this backported to 2.2.3?

msg10576 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2003-04-24 15:49

Logged In: YES user_id=6380

Yes, I think that would be a good idea.

msg10577 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2003-04-25 08:21

Logged In: YES user_id=80475

Backported to 2.2.3. Closing bug.

msg10578 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2003-04-26 17:21

Logged In: YES user_id=6380

Blech! Reopening. I just stumbled upon the relevant part of RFC 2616, and it suggests to me the following rules when the original request was a POST. I should have made the time for this earlier, but I simply didn't have time. :-(

Reference:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.3

Summary:

This suggests to me that no automatic repeat of POST requests should ever be done, and that in the case of a 302 or 303 response, a POST should be replaced by a GET; this may also be done for a 301 response -- even though the standard calls that an error, it admits that it is done by old clients.

But the new code in urllib.py doesn't seem to do that: it treats 301, 302 and 303 all the same, doing a POST if the original request was a POST (POST is determined by 'data is not None'). And it doesn't redirect on a 307 at all, even though it should do that if the original request was GET. The updated documentation describes the desired behavior for 301,302,303.

I think the desired behavior can be obtained by always omitting the data argument in the call to self.open(newurl) in redirect_internal(). Then a handler for 307 could be handled that raises an error if the original request was a POST, and redirects otherwise. I'm attaching a suggested patch to urllib.py (guido.txt).

It appears urllib2.py was patched correctly.

msg10579 - (view)

Author: John J Lee (jjlee)

Date: 2003-04-27 01:31

Logged In: YES user_id=261020

The patch for urllib.py is indeed broken, and I think Guido's patch is correct. I agree with the summary Guido gives, which is also in agreement with our previous discussion.

A side issue that occurred to me on re-reading the RFC is whether 301 redirection should be automatic. A 301 is supposed to indicate permanent redirection, so one is supposed to update one's URL when a 301 happens. Redirecting automatically doesn't allow the user of urllib / urllib2 to do that. However, I suppose that since 2.2 does redirect automatically (for both urllib and urllib2) it's a bit late to worry about this.

The patched urllib2.py is also broken in two ways:

  1. It tries to call req.method() -- which doesn't exist -- rather than req.get_method() as it should.

print "Must use pychecker.\n"*100

especially when there are no unit tests...

Anyway, I decided the new get_method method is unnecessary and the new patch I'll upload in a minute removes it again.

  1. 301 response to POST isn't redirected to GET. It should be, as we agreed earlier.

Just about to upload revised patches, against 2.3beta CVS. (urllib2.py.patch2, liburllib2.tex.patch2, liburllib.tex.patch2). They need backporting to 2.2 again, too.

Bother.

msg10580 - (view)

Author: John J Lee (jjlee)

Date: 2003-04-27 01:31

Logged In: YES user_id=261020

The patch for urllib.py is indeed broken, and I think Guido's patch is correct. I agree with the summary Guido gives, which is also in agreement with our previous discussion.

A side issue that occurred to me on re-reading the RFC is whether 301 redirection should be automatic. A 301 is supposed to indicate permanent redirection, so one is supposed to update one's URL when a 301 happens. Redirecting automatically doesn't allow the user of urllib / urllib2 to do that. However, I suppose that since 2.2 does redirect automatically (for both urllib and urllib2) it's a bit late to worry about this.

The patched urllib2.py is also broken in two ways:

  1. It tries to call req.method() -- which doesn't exist -- rather than req.get_method() as it should.

print "Must use pychecker.\n"*100

especially when there are no unit tests...

Anyway, I decided the new get_method method is unnecessary and the new patch I'll upload in a minute removes it again.

  1. 301 response to POST isn't redirected to GET. It should be, as we agreed earlier.

Just about to upload revised patches, against 2.3beta CVS. (urllib2.py.patch2, liburllib2.tex.patch2, liburllib.tex.patch2). They need backporting to 2.2 again, too.

Bother.

msg10581 - (view)

Author: John J Lee (jjlee)

Date: 2003-04-27 01:33

Logged In: YES user_id=261020

Damn this SF bug system.

msg10582 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2003-05-05 00:27

Logged In: YES user_id=6380

Note that Jeremy found a bug in what's currently checked in for urllib2.py. I'll try to sort this out coming Friday.

msg10583 - (view)

Author: John J Lee (jjlee)

Date: 2003-05-06 20:18

Logged In: YES user_id=261020

What, a different urllib2 bug than the two that I uploaded patches for on 2003-04-27?

msg10584 - (view)

Author: John J Lee (jjlee)

Date: 2003-05-07 14:21

Logged In: YES user_id=261020

About 301: I tested Mozilla 1.0.0 and Konqueror 3.1.0 and they both redirect POST to GET without asking for confirmation. Ronald Tschalar tells me he tested Mozilla 1.1 and Netscape 4.7, and got the same result. So, all is as we had thought there (haven't checked IE, though).

Side note: After reading another comment of Ronald's (he's the author of the HTTPClient java library), I realise that it would be a good idea to have urllib / urllib2 cache the results of 301 redirections. This shouldn't break anything, since it's just an internal optimisation from one point of view -- but it's also what the RFC says SHOULD happen. I'll file a separate bug.

msg10585 - (view)

Author: John J Lee (jjlee)

Date: 2003-05-09 12:46

Logged In: YES user_id=261020

Another test result on browser behaviour with 301 response to POST:

lynx 2.8.4rel.1: offers user a choice between POST, GET and cancel

Still haven't checked IE.

msg10586 - (view)

Author: John J Lee (jjlee)

Date: 2003-05-10 00:14

Logged In: YES user_id=261020

OK, I can't test IE, because Windows doesn't have a proper loopback adapter, and my Windows machine is disconnected.

msg10587 - (view)

Author: John J Lee (jjlee)

Date: 2003-05-10 12:45

Logged In: YES user_id=261020

Sorry for all these tiny updates.

Summary of what remains to be done:

  1. Apply Guido's patch for urllib.py: guido.txt. The docs are already correct.

  2. 301 in response to POST is still not redirected in urllib2. urllib2.py.patch3 fixes that (note patch3, not patch2). It also removes my Request.get_method method (which now seems like overkill, as well as breaking the URL scheme-independence of Request). Also fixes an exception message.

  3. liburllib2.tex.patch2 updates the docs to reflect the changes in urllib2.py.patch3, rephrases a note and adds a missing method description.

  4. liburllib.tex.patch2: trivial rewording of docs for clarity.

  5. Backport above changes to 2.2 bugfix branch.

Jeremy in CVS:

The latest changes to the redirect handler couldn't possibly have been tested, because they did not compute a newurl and failed with a NameError. The name == "main": block has a test for redirects.

Ah. The tests are preceeded by a warning about only working from a particular set of hosts: I mis-remembered, thinking that applied to all the tests. Sorry.

msg10588 - (view)

Author: John J Lee (jjlee)

Date: 2003-05-22 18:33

Logged In: YES user_id=261020

I tested the 301 behaviour of MSIE 5.00 on wine, and Mozilla Firebird 0.6.

Both redirect POST to GET (without asking for user confirmation) on a 301 redirect, as expected, so it's probably justified to follow through with the 301 changes we discussed (summarised in my last message, along with a couple of related changes; item 1 has already been done).

msg10589 - (view)

Author: Guido van Rossum (gvanrossum) * (Python committer)

Date: 2003-07-09 18:24

Logged In: YES user_id=6380

I've asked on python-dev for help with this, as I expect I won't have time for it before July 17 (cut-off date for 2.3c1). Feel free to assign this to yourself if you want to grab it.

msg10590 - (view)

Author: John J Lee (jjlee)

Date: 2003-07-11 00:00

Logged In: YES user_id=261020

Note that part of point 2. of my summary of 2003-05-10 is wrong, since 2.2.3 final contains the method urllib2.Request.get_method. I will upload another patch tomorrow, with that corrected.

msg10591 - (view)

Author: John J Lee (jjlee)

Date: 2003-07-11 15:45

Logged In: YES user_id=261020

I am just about to upload three patches named *.patch4, which should finally close this bug. The important change is that 301 response to a POST is now automatically redirected. The rest is minor docstring and documentation fixes.

They need backporting to 2.2 -- I haven't included separate patches since the changes are trivial.

msg10592 - (view)

Author: Martin v. Löwis (loewis) * (Python committer)

Date: 2003-07-12 07:36

Logged In: YES user_id=21627

Applied rev4 as

liburllib.tex 1.50 liburllib2.tex 1.11 urllib2.py 1.53 liburllib.tex 1.40.8.5 liburllib2.tex 1.6.8.4 urllib2.py 1.24.8.9

History

Date

User

Action

Args

2022-04-10 16:05:16

admin

set

github: 36506

2009-02-12 18:14:47

ajaksu2

link

issue1424148 dependencies

2002-04-26 16:04:10

jjlee

create