msg213555 - (view) |
Author: bob bob (bob.bob) |
Date: 2014-03-14 14:29 |
We have written a server on Python 2.7.6 that uses openssl 0.9.8y running on Windows box. Time to time our server freezes utilizing 100% CPU on ssl_init (C routine). The following is the process stack: 0021ee80 10036d19 0176d259 00000000 00000013 msvcr90+0x3b35d 0021ee9c 10022a22 0021ef1c 0176d218 1001bf62 _ssl!init_ssl+0x33b59 0021ef74 10073920 0176a158 0177ba18 00952a78 _ssl!init_ssl+0x1f862 0021ef84 10001714 0176a158 0147b350 00000000 _ssl!init_ssl+0x70760 0021ef88 0176a158 0147b350 00000000 00952a78 _ssl+0x1714 0021ef8c 0147b350 00000000 00952a78 ffffffff 0x176a158 0021ef90 00000000 00952a78 ffffffff 100d3038 0x147b350 This problem occurs when client connects to our server using slow modem channel. In this case client sends duplicated SYN TCP packet. When second SYN packet arrives to the server after SYN_ACK server's packet we have a trouble. Like this trace from server side: 140 38.807410000 172.16.35.6 192.168.52.1 TCP 74 44697 > 20444 [SYN] Seq=0 Win=14600 Len=0 MSS=1314 SACK_PERM=1 TSval=495224007 TSecr=0 WS=16 141 38.807428000 192.168.52.1 172.16.35.6 TCP 78 20444 > 44697 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460 WS=1 TSval=0 TSecr=0 SACK_PERM=1 148 39.288557000 172.16.35.6 192.168.52.1 TCP 74 44697 > 20444 [SYN] Seq=0 Win=14600 Len=0 MSS=1314 SACK_PERM=1 TSval=495224257 TSecr=0 WS=16 |
|
|
msg213573 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2014-03-14 17:33 |
Have you tried reporting this to the OpenSSL project? This doesn't sound like a bug in Python. Also, 0.9.8y is the current latest version in the 0.9.8 line, and we are unlikely to switch the 2.7 Windows builds to OpenSSL 1.x, I think (but ultimately that's Martin call). |
|
|
msg213579 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2014-03-14 17:59 |
Antoine: init_ssl would be the entry function of _ssl.c, not code from OpenSSL. bob: It's doubtful though that the stack trace is right, since init_ssl doesn't call itself. Did you install the Python PDB files for 2.7.6 before obtaining this stack trace? There are two things you could try to diagnose this further 1. Capture the complete packets using tcpdump or wireshark, for use by tcpreplay. Then see whether a replay will reliably trigger this problem, and allow for others to diagnose this further - upload a Python script and the wireshark dump here. 2. (Alternatively) Debug this yourself. Build Python in debug mode, see whether the problem still reproduces, then find out the life lock in the Visual Studio debugger. Please also report the Windows version you are using. |
|
|
msg214090 - (view) |
Author: bob bob (bob.bob) |
Date: 2014-03-19 12:51 |
loewis: This is correct stack: 0021ee80 10036d19 0176d259 00000000 00000013 msvcr90!memset+0x3d 0021ee9c 10022a22 0021ef1c 0176d218 1001bf62 _ssl!SHA1_Final+0x49 0021eea8 1001bf62 0021ef0c 0021ef1c 00000146 _ssl!OPENSSL_cleanse+0x92 0021eebc 10010319 0021ef0c 0021ef1c 00000000 _ssl!EVP_DigestFinal_ex+0x32 0021ef30 100035f3 0021ef74 00000004 00000000 _ssl!RAND_SSLeay+0x2a9 0021ef44 10085da0 0021ef74 00000004 00000000 _ssl!RAND_add+0x63 0021ef74 10073920 0176a158 0177ba18 00952a78 _ssl!ssl3_accept+0x40 0021ef84 10001714 0176a158 0147b350 00000000 _ssl!SSL_do_handshake+0x50 0021ef8c 0147b350 00000000 00952a78 ffffffff _ssl!PySSL_SSLdo_handshake+0x74 [c:\users\martin\27\python\modules\_ssl.c @ 478] 0021ef90 00000000 00952a78 ffffffff 100d3038 0x147b350 dump file attached. Packets from 5 to 8 received after client process has been terminated. Thereafter, the CPU freed. |
|
|
msg215067 - (view) |
Author: Roman O. Vlasov (rv) |
Date: 2014-03-28 18:39 |
To reproduce the 100% CPU load problem, we used a simple python TLS client on separate linux PC with Traffic control utility (tc): tc qdisc change dev eth0 root netem delay 5000ms After in-depth analyzing, we realized that _ssl.c behaves differently depening on socket.timeout: 1) sock.timeout set to None, that is equivalent to s->socket_timeout==-1 in _ssl.c produces 100% CPU load if client works on bad long delay channel. The problem is this case is that the do{} loop runs in PySSL_SSLdo_handshake() calling check_socket_and_wait_for_timeout() which immediatly returns SOCKET_IS_BLOCKING because s->sock_timeout==-1. 2) sock.timeout set to 0 (non-blocking) makes _ssl.c immediatly return with error: _ssl.c: The operation did not complete 3) socket.timeout set to any positive value makes _ssl.c wait socket on select (producing no CPU load). By default, accept() returns blocking socket with timeout set to None (1st case) Below are some code details: Our server class is inherited from asyncore.dispatcher. In __init__ it executes the following statements (ripped), creating listening socket, accepting client connection and then doing wrap_socket: ----------------------------------------------------------- asyncore.py: sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.setblocking(0) sock.setsockopt( socket.SOL_SOCKET, socket.SO_REUSEADDR, socket.getsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR) | 1 ) sock.bind(self.server_address) sock.listen(5) asyncore.loop(...) select.select(...) # wait for client connection # Here client connects conn, addr = sock.accept() # conn.gettimeout() returns None which means blocking socket ssl_sock = ssl.wrap_socket( conn , server_side = server_side , certfile = certfile , cert_reqs = cert_reqs , ssl_version=ssl.PROTOCOL_SSLv3 ) File "C:\Python27\Lib\ssl.py", line 399, in wrap_socket File "C:\Python27\Lib\ssl.py", line 152, in __init__ self.do_handshake() File "C:\Python27\Lib\ssl.py", line 315, in do_handshake self._sslobj.do_handshake() _ssl.c: PySSL_SSLdo_handshake(): // wants to read more data from socket if (err == SSL_ERROR_WANT_READ) { sockstate = check_socket_and_wait_for_timeout(self->Socket, 0); ... check_socket_and_wait_for_timeout(): /* Nothing to do unless we're in timeout mode (not non-blocking) */ if (s->sock_timeout < 0.0) return SOCKET_IS_BLOCKING; // <-- this is 1st case producing 100% CPU load else if (s->sock_timeout == 0.0) return SOCKET_IS_NONBLOCKING; // <-- this is 2nd case returning error immediatly We think that anyone who follows standard Python documentation (http://docs.python.org/2/library/ssl.html#server-side-operation) will get the same result if using client with delay>1000ms |
|
|
msg215068 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2014-03-28 18:53 |
How come OpenSSL returns SSL_ERROR_WANT_READ if the socket is blocking? Perhaps Windows returns a non-blocking socket for accept()? Have you tried with Python 3? |
|
|
msg215185 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2014-03-30 15:21 |
As Antoine says: if the socket is *really* in blocking mode, then SSL_do_handshake should block until the handshake if fully complete. It should not return SSL_ERROR_WANT_READ in this case. If you have managed to build Python for yourself to diagnose this further, please add a call to ioctlsocket in this loop to verify that the socket is in blocking mode. |
|
|
msg215766 - (view) |
Author: Roman O. Vlasov (rv) |
Date: 2014-04-08 15:10 |
Antoine, Martin, thank you for your replies. You was right: NT socket was not in blocking mode (in 1st case). I didn't knew how to determine socket mode in NT, so I explicitly set socket mode to blocking in _ssl.c before calling SSL_do_handshake() for test: _ssl.c::PySSL_SSLdo_handshake(): // ------------ Explicitly set blocking mode unsigned long iMode = 0UL; // If iMode = 0, blocking is enabled; int iResult = ioctlsocket( self->Socket->sock_fd, FIONBIO, &iMode ); if (iResult != NO_ERROR) printf("\nioctlsocket failed with error: %ld\n", iResult); // ------------ PySSL_BEGIN_ALLOW_THREADS ret = SSL_do_handshake(self->ssl); ... Test result: SSL_do_handshake() did not return (and no CPU load). But in general, I think that check_socket_and_wait_for_timeout() logic is erroneous: if (s->sock_timeout < 0.0) return SOCKET_IS_BLOCKING; By default (see case #1 in my message above), timeout in Python wrapping object is -1 (self->Socket->sock_timeout==-1). But underlying NT socket is really non-blocking. This leads to 100% CPU load: PySSL_SSLdo_handshake() loop is calling check_socket_and_wait_for_timeout() which immediately returns. NB. I use Python 2.7.6 and openssl-0.9.8y for my build and never tried Python 3. |
|
|
msg215903 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2014-04-10 19:42 |
Thanks. It would be nice if you could investigate why the underlying NT socket is in non-blocking mode (AFAIK, this issue hasn't been reported before). Also, it would be nice if you could check with Python 3 as well. |
|
|
msg217232 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2014-04-27 00:26 |
> Please also report the Windows version you are using. I don't see the answer to this question |
|
|
msg239150 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2015-03-24 16:24 |
Is this still considered valid considering from https://www.python.org/downloads/release/python-279/ "The entirety of Python 3.4's ssl module has been backported for Python 2.7.9. See PEP 466 for justification." ? |
|
|