msg249361 - (view) |
Author: Thomas Belhalfaoui (thomas.belhalfaoui) * |
Date: 2015-08-30 17:41 |
When using httplib / http.client to connect to an HTTPS website through a proxy (by making a tunnel with a CONNECT request), there is no way to retrieve the HTTP headers which the proxy sends back in response to that CONNECT request. This becomes a problem when using rotating proxy providers like ProxyMesh, who send useful information in those headers (for instance, "X-ProxyMesh-IP" contains the IP address of the proxy, which is necessary to keep the same address throughout the session). It would be nice to save those headers in a property of the HTTPConnection class (e.g. self._tunnel_response_headers), which would be set up inside the _tunnel method (as proposed in the attached patch, lines 748 and 827-831). This would allow to get the headers back and/or pass them to a higher-level library (such as requests). |
|
|
msg249373 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-08-30 23:05 |
Such a change would involve adding a new API, so should go into a new version of Python. Thomas: a diff rather than a full copy of the changed file would be more convenient. Also, if this gets accepted, test cases and documentation would be needed. It is also useful to get the header of an unsuccessful CONNECT response. For example, see Issue 7291, where the Proxy-Authenticate header of the proxy’s 407 response needs to be accessible. In that issue, I started working on a patch tht may also be useful here. From memory, usage would be a bit like this: proxy_conn = HTTPConnection("proxy") proxy_conn.request("CONNECT", "website:443") proxy_resp = proxy_conn.getresponse() if proxy_resp.status == PROXY_AUTHENTICATION_REQUIRED: # Handle proxy_resp.msg["Proxy-Authenticate"] ... # Handle proxy_resp.msg["X-ProxyMesh-IP"] ... tunnel = proxy_conn.detach() # Returns socket and any buffered data website_conn = HTTPSConnection("website", tunnel=tunnel) website_conn.request("GET", "/") ... website_conn.close() Thomas, let me know if this would be useful for you, and I can try and dig up my patch. |
|
|
msg249741 - (view) |
Author: Thomas Belhalfaoui (thomas.belhalfaoui) * |
Date: 2015-09-04 08:34 |
Martin: Thanks for your quick answer (and sorry for sending the whole file) ! I think it is indeed a good idea to detach the proxy connection and treat it as any other connection, as you did in your patch. It would be great if you would be able to dig it up ! |
|
|
msg249803 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2015-09-04 18:24 |
Thomas, please sign a contributor agreement for your patches to be considered. https://www.python.org/psf/contrib/ https://www.python.org/psf/contrib/contrib-form/ |
|
|
msg249897 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-09-05 06:27 |
This is the patch I had in mind. It looks like it only implements the detach() method, so we would still need to add support for passing in the tunnel details to the HTTPSConnection constructor. This patch would allow doing stuff at a lower level than the existing tunnel functionality. The patch includes a test case for getting the proxy’s response header fields, and another test case illustrating how a plain text HTTP 2 upgrade could work. |
|
|
msg249906 - (view) |
Author: Thomas Belhalfaoui (thomas.belhalfaoui) * |
Date: 2015-09-05 11:59 |
Terry: Thanks for the form, I just filled it. Martin: Thanks for sending your patch. I will dive into it, and try to figure out how to add support for passing in the tunnel details to the HTTPSConnection constructor. |
|
|
msg249996 - (view) |
Author: Thomas Belhalfaoui (thomas.belhalfaoui) * |
Date: 2015-09-06 15:31 |
Martin, I went through your patch and made some simple tests, and I have a couple of questions. 1) When I run the following code, I get a "Bad file descriptor" : conn = httplib.HTTPConnection("uk.proxymesh.com", 31280) conn.set_tunnel("www.google.com", 80) conn.request("GET", "/") resp = conn.getresponse() print(resp.read()) So I tweaked the "getresponse" function so that it does not call "self.close()" (i.e. the connection stays open after the CONNECT request) in that case, and it seems to works fine. 2) I added "self.sock, _ = tunnel" in HTTPConnection constructor, to try your use case, but I get "http.client.RemoteDisconnected: Remote end closed connection without response". Do you think it makes sense or am I missing something ? |
|
|
msg250085 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-09-07 07:12 |
1) The real problem is when _tunnel() internally calls getresponse(), it notices the connection cannot be reused for another request, and closes the socket object. Perhaps I should rethink my logic; maybe move sock and detach() to HTTPResponse. 2) With some rough experimentation, passing tunnel through the HTTPConnection (plain text HTTP) constructor seems to work for me. However if you meant HTTPSConnection (over TLS) instead, you will probably need to manually do the wrap_socket() step. Maybe that’s why your connection is being dropped. |
|
|
msg393729 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2021-05-15 20:55 |
Alexey, to repeat what I said to Thomas above: please sign a contributor agreement for your patches to be considered. https://www.python.org/psf/contrib/ https://www.python.org/psf/contrib/contrib-form/ |
|
|
msg393730 - (view) |
Author: Alexey Namyotkin (alexey.namyotkin) * |
Date: 2021-05-15 21:19 |
Thanks, Terry. I signed it. |
|
|