[Python-Dev] [regex] memory leak (original) (raw)

MRAB python at mrabarnett.plus.com
Sun Aug 2 17:54:22 CEST 2009


John Machin wrote:

Hi Matthew,

Your post in c.l.py about your re rewrite didn't mention where to report bugs etc so I dug this address out of Google Groups ... Environment: Python 2.6.2, Windows XP SP3, your latest (29 July) regex from the Python bugtracker. Problem is repeated calls of e.g. compiledpattern.search(sometext) -- Task Manager performance panel shows increasing memory usage with regex but not with re. It appears to be cumulative i.e. changing to another pattern or text doesn't release memory. Example: 8<-- regextimer.py_ _import sys_ _import time_ _if sys.platform == 'win32':_ _timer = time.clock_ _else:_ _timer = time.time_ _module = _import_(sys.argv[1])_ _count = int(sys.argv[2])_ _pattern = sys.argv[3]_ _expected = sys.argv[4]_ _text = 80 * '' + 'qwerty'_ rx = module.compile(pattern) t0 = timer() for i in xrange(count): assert rx.search(text).group(0) == expected t1 = timer() print "%d iterations in %.6f seconds" % (count, t1 - t0) _8<---_ _Here are the results of running this (plus observed difference between_ _peak memory usage and base memory usage):_ _dos-prompt>\python26\python regextimer.py regex 1000000 "" "~" 1000000 iterations in 3.811500 seconds [60 Mb] dos-prompt>\python26\python regextimer.py regex 2000000 "" "" 2000000 iterations in 7.581335 seconds [128 Mb] dos-prompt>\python26\python regextimer.py re 2000000 "" "" 2000000 iterations in 2.549738 seconds [3 Mb] This happens on a variety of patterns: "w", "wert", "[a-z]+", "[a-z]+t", ... Thanks for that, John. I've should've kept an eye on the Task Manager! :-) Now fixed.

It's surprising how much time and effort is needed just to manage the memory!



More information about the Python-Dev mailing list