Issue 26259: Memleak when repeated calls to asyncio.queue.Queue.get is performed, without push to queue. (original) (raw)

Created on 2016-02-01 16:06 by Brunsgaard, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
poc.py Brunsgaard,2016-02-01 16:06 Example program to reproduce
poc.py Bartek Biernacki,2017-03-08 16:21 Custom Queue without described behaviour
Messages (10)
msg259327 - (view) Author: Jonas Brunsgaard (Brunsgaard) Date: 2016-02-01 16:06
When making repeated calls to queue.get, memory is building up and is not freed until queue.push is called. I wrote this little program to show my findings. The program will perform a lot of calls to queue.get and once every 60 seconds a queue.push is performed. Every 15 seconds the memory usage of dictionaries is printet to the console. You can find the output below the program ``` import asyncio from pympler import muppy from pympler import summary q = asyncio.Queue() loop = asyncio.get_event_loop() closing = False async def get_with_timeout(): while not closing: try: task = asyncio.ensure_future(q.get()) await asyncio.wait_for(task, 0.2) except asyncio.TimeoutError: pass def mem_profiling(): if not closing: types_ = muppy.filter(muppy.get_objects(), Type=dict) summary.print_(summary.summarize(types_)) loop.call_later(15, mem_profiling) def put(): q.put_nowait(None) loop.call_later(60, put) put() tasks = [asyncio.ensure_future(get_with_timeout()) for _ in range(10000)] mem_profiling() try: loop.run_forever() except KeyboardInterrupt: closing = True loop.run_until_complete( asyncio.ensure_future(asyncio.wait(tasks))) finally: loop.close() ``` Output: types | # objects total size ======================================== =========== ============ <class 'dict 11886 3.20 MB <class 'collections.OrderedDict 11 16.84 KB <class 'email._encoded_words._QByteMap 1 288 B types # objects total size ======================================== =========== ============ <class 'dict 224835 62.44 MB <class 'collections.OrderedDict 11 16.84 KB <class 'email._encoded_words._QByteMap 1 288 B types # objects total size ======================================== =========== ============ <class 'dict 401487 111.00 MB <class 'collections.OrderedDict 11 16.84 KB <class 'email._encoded_words._QByteMap 1 288 B types # objects total size ======================================== =========== ============ <class 'dict 576621 158.86 MB <class 'collections.OrderedDict 11 16.84 KB <class 'email._encoded_words._QByteMap 1 288 B types # objects total size ======================================== =========== ============ <class 'dict 219745 58.99 MB <class 'collections.OrderedDict 11 16.84 KB <class 'email._encoded_words._QByteMap 1 288 B types # objects total size ======================================== =========== ============ <class 'dict 390061 105.72 MB <class 'collections.OrderedDict 11 16.84 KB <class 'email._encoded_words._QByteMap 1 288 B types # objects total size ======================================== =========== ============ <class 'dict 560623 152.65 MB <class 'collections.OrderedDict 11 16.84 KB <class 'email._encoded_words._QByteMap 1 288 B types # objects total size ======================================== =========== ============ <class 'dict 220372 59.11 MB <class 'collections.OrderedDict 11 16.84 KB <class 'email._encoded_words._QByteMap 1 288 B types # objects total size ======================================== =========== ============ What we see is that memory builds up ~3mb/s, and when the push method is called the memory usage returns to normal. Is this the expected behavior or is this a bug? If it is expected I think we should update the documentation, to let people know about this behavior. -- Jonas Brunsgaard
msg259328 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-02-01 16:35
I think this is expected behavior when you do a lot of get() calls that are cancelled by a timeout. Each call leaves a cancelled Future in the deque self._getters; these are removed only when _wakeup_next() is called (by put or put_nowait). Why is your app doing this? I suppose we could band-aid this by removing cancelled Futures when next calling get(), but that won't help if you start a load of get() calls and then cancel them all. Note that I don't want to complicate this code -- in the past it has been very difficult to prove it was correct under all circumstances, and now I am sure it is correct (barring memory leaks, as you've shown). I want to make sure any fix for this issue doesn't complicate the code, making it harder to understand or prove its correctness. On Mon, Feb 1, 2016 at 8:06 AM, Jonas Brunsgaard <report@bugs.python.org> wrote: > > New submission from Jonas Brunsgaard: > > When making repeated calls to queue.get, memory is building up and is not freed until queue.push is called. > > I wrote this little program to show my findings. The program will perform a lot of calls to queue.get and once every 60 seconds a queue.push is performed. Every 15 seconds the memory usage of dictionaries is printet to the console. You can find the output below the program > > ``` > import asyncio > from pympler import muppy > from pympler import summary > > > q = asyncio.Queue() > loop = asyncio.get_event_loop() > closing = False > > > async def get_with_timeout(): > while not closing: > try: > task = asyncio.ensure_future(q.get()) > await asyncio.wait_for(task, 0.2) > except asyncio.TimeoutError: > pass > > def mem_profiling(): > if not closing: > types_ = muppy.filter(muppy.get_objects(), Type=dict) > summary.print_(summary.summarize(types_)) > loop.call_later(15, mem_profiling) > > def put(): > q.put_nowait(None) > loop.call_later(60, put) > > put() > tasks = [asyncio.ensure_future(get_with_timeout()) for _ in range(10000)] > mem_profiling() > > try: > loop.run_forever() > except KeyboardInterrupt: > closing = True > loop.run_until_complete( > asyncio.ensure_future(asyncio.wait(tasks))) > finally: > loop.close() > ``` > > Output: > > types | # objects total size > ======================================== =========== ============ > <class 'dict 11886 3.20 MB > <class 'collections.OrderedDict 11 16.84 KB > <class 'email._encoded_words._QByteMap 1 288 B > types # objects total size > ======================================== =========== ============ > <class 'dict 224835 62.44 MB > <class 'collections.OrderedDict 11 16.84 KB > <class 'email._encoded_words._QByteMap 1 288 B > types # objects total size > ======================================== =========== ============ > <class 'dict 401487 111.00 MB > <class 'collections.OrderedDict 11 16.84 KB > <class 'email._encoded_words._QByteMap 1 288 B > types # objects total size > ======================================== =========== ============ > <class 'dict 576621 158.86 MB > <class 'collections.OrderedDict 11 16.84 KB > <class 'email._encoded_words._QByteMap 1 288 B > types # objects total size > ======================================== =========== ============ > <class 'dict 219745 58.99 MB > <class 'collections.OrderedDict 11 16.84 KB > <class 'email._encoded_words._QByteMap 1 288 B > types # objects total size > ======================================== =========== ============ > <class 'dict 390061 105.72 MB > <class 'collections.OrderedDict 11 16.84 KB > <class 'email._encoded_words._QByteMap 1 288 B > types # objects total size > ======================================== =========== ============ > <class 'dict 560623 152.65 MB > <class 'collections.OrderedDict 11 16.84 KB > <class 'email._encoded_words._QByteMap 1 288 B > types # objects total size > ======================================== =========== ============ > <class 'dict 220372 59.11 MB > <class 'collections.OrderedDict 11 16.84 KB > <class 'email._encoded_words._QByteMap 1 288 B > types # objects total size > ======================================== =========== ============ > > What we see is that memory builds up ~3mb/s, and when the push method is called the memory usage returns to normal. > > Is this the expected behavior or is this a bug? If it is expected I think we should update the documentation, to let people know about this behavior. > > -- > Jonas Brunsgaard > > ---------- > components: asyncio > files: poc.py > messages: 259327 > nosy: Jonas Brunsgaard, gvanrossum, haypo, yselivanov > priority: normal > severity: normal > status: open > title: Memleak when repeated calls to asyncio.queue.Queue.get is performed, without push to queue. > type: resource usage > versions: Python 3.5 > Added file: http://bugs.python.org/file41772/poc.py > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue26259> > _______________________________________
msg259335 - (view) Author: Jonas Brunsgaard (Brunsgaard) Date: 2016-02-01 17:28
In my particular case, I developed an application close to beanstalkd, but with redis as "engine". I did create a callbackback reader class for users to subclass, the callbackreader is checking every second, on every tube(queue.Object). If new data has arrived for processing (this subroutine is using queue.get with wait_for). Maybe asyncio.Condition would have been the better choice. The easy solution was to check if the queue was empty and skip the read (get call) if there was nothing in the queue. Before my fix, over a week the program would take up 10 Gigs of memory in our staging environment if nothing was touched, so I was assigned to investigate the cause. I think the current behavior is undesirable and cumbersome to see through, and if not changed there should be some kind of note in the documentation, so other good python folks will have a better chance to understand the behavior without reading the cpython asyncio queue implementation.
msg259340 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-02-01 18:38
> The easy solution was to check if the queue was empty and skip the read (get call) if there was nothing in the queue. There's an API for that -- get_nowait(). I'm happy to review a fix if you come up with one. But I currently don't have time to devise a fix myself. (Not even a docfix, if you think that's sufficient, but if you submit one it will swiftly be applied.) On Mon, Feb 1, 2016 at 9:28 AM, Jonas Brunsgaard <report@bugs.python.org> wrote: > > Jonas Brunsgaard added the comment: > > In my particular case, I developed an application close to beanstalkd, but with redis as "engine". I did create a callbackback reader class for users to subclass, the callbackreader is checking every second, on every tube(queue.Object). If new data has arrived for processing (this subroutine is using queue.get with wait_for). Maybe asyncio.Condition would have been the better choice. The easy solution was to check if the queue was empty and skip the read (get call) if there was nothing in the queue. > > Before my fix, over a week the program would take up 10 Gigs of memory in our staging environment if nothing was touched, so I was assigned to investigate the cause. I think the current behavior is undesirable and cumbersome to see through, and if not changed there should be some kind of note in the documentation, so other good python folks will have a better chance to understand the behavior without reading the cpython asyncio queue implementation. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue26259> > _______________________________________
msg259374 - (view) Author: Jonas Brunsgaard (Brunsgaard) Date: 2016-02-02 10:25
You are right that get_nowait() is the correct api for my use case, using get_nowait() nothing is pushed to the internal _getters deque. The reason for my us of get() is that job futures are created one place in the code and then thrown in a processing function that will yield the job future. This design was created to handle all exceptions in processing(), but I agree that get_nowait would have been the superior solution. I do not have time on my hands right now to take on the challenge of writing a patch, but I might take it up later, sound fun ;) Good day to you sir, and thank you for the feedback.
msg259448 - (view) Author: Jonas Brunsgaard (Brunsgaard) Date: 2016-02-03 00:13
Okay I thoroughly read the code again. Can you describe the architectural changes to the code regarding a patch, I will do a proposal. But I have to know we are on the same page, so I do not waste my and your time :)
msg259449 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-02-03 00:21
First, you wrote "The easy solution was to check if the queue was empty and skip the read (get call) if there was nothing in the queue" -- which is exactly what get_nowait() does. As to a patch, I think you should just describe how you would change the code, then we can talk details.
msg289244 - (view) Author: Bartek Biernacki (Bartek Biernacki) Date: 2017-03-08 16:21
I ran into similar problem with many getters timing out and staying in memory until some put is called. I think this may be solved by using a callback on Future which would clean it if it was cancelled. I am presenting the idea on attached poc. If you think that this is a good idea I can provide a patch.
msg303389 - (view) Author: Suren Nihalani (snihalani) * Date: 2017-09-30 03:48
@cjrh brought up that this issue is the same as https://bugs.python.org/issue31620. I put up a PR there. Can people review that and then we can close this?
msg308235 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2017-12-13 19:18
Duplicate of https://bugs.python.org/issue31620
History
Date User Action Args
2022-04-11 14:58:27 admin set github: 70447
2017-12-13 19🔞26 asvetlov set status: open -> closedsuperseder: asyncio.Queue leaks memory if the queue is empty and consumers poll it frequentlynosy: + asvetlovmessages: + resolution: duplicatestage: resolved
2017-09-30 03:48:43 snihalani set nosy: + snihalanimessages: +
2017-03-08 17:04:33 gvanrossum set nosy: - gvanrossum
2017-03-08 16:22:53 vstinner set nosy: - vstinner
2017-03-08 16:21:47 Bartek Biernacki set files: + poc.pynosy: + Bartek Biernackimessages: +
2016-02-03 00:21:34 gvanrossum set messages: +
2016-02-03 00:13:07 Brunsgaard set messages: +
2016-02-02 10:25:50 Brunsgaard set messages: +
2016-02-01 18:38:50 gvanrossum set messages: +
2016-02-01 17:28:00 Brunsgaard set messages: +
2016-02-01 16:35:23 gvanrossum set messages: +
2016-02-01 16:06:20 Brunsgaard create