Issue 33263: Asyncio server enters an invalid state after a request with SO_LINGER (original) (raw)

Long story short, if you have a TCP asyncio-server (implementing asyncio.Protocol) that sends something to socket in connection_made callback, it will leak fds and won't be able to handle some consequent requests, if you will send to it RST right after TCP-handshake.

In my case, these requests are sent by keepalived demon (it makes TCP-checks that way), but they can be easily hand-crafted.

Example server:

import asyncio

class EchoServerClientProtocol(asyncio.Protocol): def connection_made(self, transport): self.transport = transport self.transport.write(b'Say something')

def data_received(self, data):
    self.transport.write(data)
    self.transport.close()

loop = asyncio.get_event_loop() coro = loop.create_server(EchoServerClientProtocol, '127.0.0.1', 8888) server = loop.run_until_complete(coro) loop.run_forever()

Example client:

import socket import struct import time

first request

sock = socket.socket() l_onoff = 1 l_linger = 0 sock.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', l_onoff, l_linger)) sock.connect(('127.0.0.1', 8022)) sock.close()

time.sleep(1)

second request

sock = socket.socket() sock.connect(('127.0.0.1', 8888)) print('Got', sock.recv(1024)) sock.sendall(b'Hi there') print('Got', sock.recv(1024)) # <-- Will hang here sock.close()

Why is this happening?

  1. First request 1.1. _SelectorSocketTransport.init schedules connection_made and loop.add_reader. 1.2. connection_made tries to send data to the socket, it fails, _SelectorTransport._fatal_error is called, then _SelectorTransport._force_close is called. 1.3. _SelectorTransport._force_close removes the reader from the loop (there is none, it's ok), and schedules closing the socket in _SelectorTransport._call_connection_lost. 1.4. loop.add_reader is called (scheduled on step 1), it calls epoll_ctl, no error is returned because the socket isn't closed yet. 1.5. the socket is closed by _SelectorTransport._call_connection_lost (scheduled on step 3). The corresponding entry in _SelectorMapping remains and isn't cleaned up.

  2. Second request 2.1. FD from the first request is reused by the OS 2.2. BaseSelectorEventLoop._add_reader sees that the FD is already in BaseSelectorEventLoop._selector, so it calls _selector.modify 2.3 _BaseSelectorImpl.modify sees that the events mask is the same, so it uses a shortcut, changing only the callback, but without calling epoll_ctl. The socket won't be polled.

I think the source of the error is in step 1.4, we shouldn't add the reader if we are already closing the transport. I've made a simple PR fixing this, but maybe there are other options here.