Issue 7593: Computed-goto patch for RE engine (original) (raw)

Created on 2009-12-29 02:11 by akuchling, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
re-computed-goto-v1.txt akuchling,2009-12-29 02:12 re-computed-goto-v1.txt
Messages (6)
msg96984 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2009-12-29 02:11
Part of Unladen Swallow's roadmap is to use a threaded-interpreter technique for the regular expression engine. That sounded like an interesting idea, so I went ahead and tried to implement it. The current patch is attached. To try it: run configure --with-computed-gotos; apply the patch; and compile Python. Still to do: * Benchmarking, to determine if it's actually an improvement. * Possibly integrate construction of the Modules/re_opcodes.h file into the build process. The current file is supplied; to rebuild it, run "python Modules/makereopcodes.py > Modules/re_opcodes.h". (But it only needs rebuilding if you add new regex opcodes, which is rarely done -- maybe it's sufficient to include a reminder to update it plus the script).
msg97038 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-12-30 16:36
On the principle nothing looks wrong. There are some tabs-vs-spaces issues in _sre.c. Can some out-of-bounds crash be triggered if an opcode is greater than 33? It needs some benchmarks to know whether it's efficient. Also, I think it would be nice to include regeneration of the opcode table in the build process. Since it's very light-weight it can be done unconditionally (I suppose it will be done from setup.py, right?).
msg97097 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2009-12-31 14:54
_sre is listed in Modules/Setup, so it will be a built-in module by default.
msg99471 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2010-02-17 14:36
I finally got around to benchmarking this change, and unfortunately the results are not good. I used the regex tests in the Unladen Swallow test suite, regex_effbot and regex_v8. The tests are written for Python 2.x, but the fixes for 3.x are straightforward (use print() in one function; replace xrange with range in the bm_regex_effbot.py and bm_regex_v8.py files; remove a few uses of u''; I'll provide a patch later.) Hardware: MacBook, Intel Core 2 Duo, 1.83GHz, 2MB L2 cache, 667 MHz bus. Tests invoked with ./perf.py -b regex_effbot -r -v ../py3k/python.exe ../threaded-3000/python.exe regex_effbot is 1.1002 times slower with the computed-goto patch, and regex_v8 is 1.0081x slower. perf.py indicates that both results are significant. I'd like to see a few people replicate these results -- maybe the effect is very platform dependent or I ran the tests incorrectly -- but on current evidence, this patch is not worth pursuing.
msg99473 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2010-02-17 15:13
Actually, I really want someone to verify that measurement. As a control, I tried running the call_method benchmark (after a few more xrange fixes). The Python 3.x trunk version with my patch is measured as 1.0227x slower, even though the patch only touches the re module and call_method doesn't use the module at all. I recompiled both binaries; both builds are using the same compiler arguments; both have the same version from trunk. I'm mystified about why the patched version is slower.
msg99478 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-02-17 18:50
You should disassemble the output (or produce assembler from gcc) and check that the various indirect jumps at the end of each case block don't get merged into a single shared indirect jump. Or perhaps it's simply that regular expression matching isn't really sensitive to bytecode dispatch overhead.
History
Date User Action Args
2022-04-11 14:56:55 admin set github: 51842
2013-11-06 20:23:25 akuchling set status: open -> closedresolution: wont fix
2012-07-25 22:46:32 ezio.melotti set versions: + Python 3.4, - Python 3.2
2010-05-28 23:30:55 cdleary set nosy: + cdleary
2010-02-17 18:50:13 pitrou set messages: +
2010-02-17 15:13:57 akuchling set messages: +
2010-02-17 14:36:51 akuchling set messages: +
2009-12-31 14:54:16 georg.brandl set nosy: + georg.brandlmessages: +
2009-12-30 16:36:48 pitrou set messages: +
2009-12-30 16:19:50 pitrou set nosy: + pitrou
2009-12-30 15:48:22 georg.brandl set title: Computed-goto patch -> Computed-goto patch for RE engine
2009-12-29 07:07:28 loewis set nosy: + loewis
2009-12-29 02:15:57 ezio.melotti set nosy: + ezio.melottipriority: normalkeywords: + patch, needs reviewtype: performancestage: patch review
2009-12-29 02:12:06 akuchling set files: + re-computed-goto-v1.txt
2009-12-29 02:11:30 akuchling create