[Python-Dev] Changing pymalloc behaviour for long running processes (original) (raw)
Evan Jones ejones at uwaterloo.ca
Tue Oct 19 06:00:46 CEST 2004
- Previous message: [Python-Dev] Re: import w/options
- Next message: [Python-Dev] Changing pymalloc behaviour for long running processes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I know that this has been discussed a bit in the past, but I was hoping
that some Python gurus could shed some light on this issue, and maybe
let me know if there are any plans for solving this problem. I know a
hack that might work, but there must be a better way to solve this
problem.
The short version of the problem is that obmalloc.c never frees memory.
This is a great strategy if the application runs for a short time then
quits, or if it has fairly constant memory usage. However, applications
with very dynamic memory needs and that run for a long time do not
perform well because Python hangs on to the peak amount of memory
required, even if that memory is only required for a tiny fraction of
the run time. With my application, I have a python process which occupy
1 GB of RAM for ~20 hours, even though it only uses that 1 GB for about
5 minutes. This is a problem that needs to be addressed, as it
negatively impacts the performance of Python when manipulating very
large data sets. In fact, I found a mailing list post where the poster
was looking for a workaround for this issue, but I can't find it now.
Some posts to various lists [1] have stated that this is not a real
problem because virtual memory takes care of it. This is fair if you
are talking about a couple megabytes. In my case, I'm talking about
~700 MB of wasted RAM, which is a problem. First, this is wasting space
which could be used for disk cache, which would improve the performance
of my system. Second, when the system decides to swap out the pages
that haven't been used for a while, they are dirty and must be written
to swap. If Python ever wants to use them again, they will be brought
it from swap. This is much worse than informing the system that the
pages can be discarded, and allocating them again later. In fact, the
other native object types (ints, lists) seem to realize that holding on
to a huge amount of memory indefinitely is a bad strategy, because they
explicitly limit the size of their free lists. So why is this not a
good idea for other types?
Does anyone else see this as a problem? Does anyone think this is not a
problem?
Proposal:
- Python's memory allocator should occasionally free memory if the
memory usage has been relatively constant, and has been well below the
amount of memory allocated. This will incur additional overhead to free
the memory, and additional overhead to reallocate it if the memory is
needed again quickly. However, it will make Python co-operate nicely
with other processes, and a clever implementation should be able to
reduce the overhead.
Problem:
- I do not completely understand Python's memory allocator, but from
what I see, it will not easily support this.
Gross Hack:
I've been playing with the fact that the "collect" function in the gc
module already gets called occasionally. Whenever it gets called for a
level 2 collection, I've hacked it to call a cleanup function in
obmalloc.c. This function goes through the free pool list, reorganizes
it to decrease memory fragmentation and decides based on metrics
collected from the last run if it should free some memory. It currently
works fine, except that it will permit the arena vector to grow
indefinitely, which is also bad for a long running process. It is also
bad because these cleanups are relatively slow as they touch every free
page that is currently allocated, so I'm trying to figure out a way to
integrate them more cleanly into the allocator itself.
This also requires that nothing call the allocation functions while
this is happening. I believe that this is reasonable, considering that
it is getting called from the cyclical garbage collector, but I don't
know enough about Python internals to figure that out.
Eventually, I hope to do some benchmarks and figure out if this is
actually a reasonable strategy. However, I was hoping to get some
feedback before I waste too much time on this.
Evan Jones
[1]
http://groups.google.com/groups?selm=mailman.1053801468.4243.python-
list%40python.org
-- Evan Jones: http://evanjones.ca/ "Computers are useless. They can only give answers" - Pablo Picasso
- Previous message: [Python-Dev] Re: import w/options
- Next message: [Python-Dev] Changing pymalloc behaviour for long running processes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]