[Python-Dev] SRE 0.9.8 benchmarks (original) (raw)
Fredrik Lundh Fredrik Lundh" <effbot@telia.com
Wed, 2 Aug 2000 23:37:52 +0200
- Previous message: [Python-Dev] Re: [Doc-SIG] Python HOWTO project created
- Next message: [Python-Dev] SRE 0.9.8 benchmarks
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Guido asked me to update my old SRE benchmarks, and post them to python-dev.
Summary:
-- SRE is usually faster than the old RE module (PRE).
-- SRE is faster than REGEX on anything but very trivial patterns and short target strings. And in some cases, it's even faster than a corresponding string.find...
-- on real-life benchmarks like XML parsing and Python tokenizing, SRE is 2-3 times faster than PRE.
-- using Unicode strings instead of 8-bit strings doesn't hurt performance (for some tests, the Unicode version is 30-40% faster on my machine. Go figure...)
-- PRE is still faster for some patterns, especially when using long target strings. I know why, and I plan to fix that before 2.0 final.
enjoy /F
These tests were made on a P3/233 MHz running Windows 95, using a local build of the 0.9.8 release (this will go into 1.6b1, I suppose).
parsing xml:
running xmllib.py on hamlet.xml (280k):
sre8 7.14 seconds sre16 7.82 seconds pre 17.17 seconds
(for the sre16 test, the xml file was converted to unicode before it was fed to the unmodified parser).
for comparision, here's the results for a couple of fast pure-Python parsers:
rex/pre 2.44 seconds rex/sre 0.59 seconds srex/sre 0.16 seconds
(rex is a shallow XML parser, based on code by Robert Cameron. srex is an even simpler shallow parser, using sre's template mode).
parsing python:
running tokenize.py on Tkinter.py (156k):
sre8 3.23 seconds pre 7.57 seconds
searching for literal text:
searching for "spam" in a string padded with "spaz" (1000 bytes on each side of the target):
string.find 0.112 ms sre8.search 0.059 pre.search 0.122
unicode.find 0.130 sre16.search 0.065
(yes, regular expressions can run faster than optimized C code -- as long as we don't take compilation time into account ;-)
same test, without any false matches:
string.find 0.035 ms sre8.search 0.050 pre.search 0.116
unicode.find 0.031 sre16.search 0.055
compiling regular expressions
compiling the 480 tests in the standard test suite:
sre 1.22 seconds pre 0.05 seconds
or in other words, pre (using a compiler written in C) can compile just under 10,000 patterns per second. sre can only compile about 400 pattern per second. do we care? ;-)
(footnote: sre's pattern cache stores 100 patterns. pre's cache hold 20 patterns, iirc).
benchmark suite
to round off this report, here's a couple of "micro benchmarks". all times are in milliseconds.
n=3D 0 5 50 250 1000 5000
pattern 'Python|Perl', string '-'*n+'Perl'+'-'*n sre8 0.014 0.013 0.013 0.016 0.027 0.079 sre16 0.014 0.014 0.015 0.018 0.025 0.076 pre 0.107 0.109 0.114 0.116 0.135 0.259 regex 0.011 0.011 0.012 0.016 0.033 0.122
pattern 'Python|Perl', string 'P'*n+'Perl'+'P'*n sre8 0.013 0.016 0.030 0.100 0.358 1.716 sre16 0.014 0.015 0.030 0.094 0.347 1.649 pre 0.115 0.112 0.158 0.351 1.085 5.002 regex 0.010 0.016 0.060 0.271 1.022 5.162
(false matches causes problems for pre and regex)
pattern '(Python|Perl)', string '-'*n+'Perl'+'-'*n sre8 0.014 0.016 0.030 0.099 0.362 1.684 sre16 0.015 0.016 0.030 0.094 0.340 1.623 pre 0.110 0.111 0.112 0.119 0.143 0.267 regex 0.012 0.012 0.013 0.017 0.034 0.124
(in 0.9.8, sre's optimizer doesn't grok named groups, and it doesn't realize that this pattern has to start with a "P")
pattern '(?:Python|Perl)', string '-'*n+'Perl'+'-'*n sre8 0.013 0.013 0.014 0.016 0.027 0.079 sre16 0.015 0.014 0.016 0.018 0.026 0.075 pre 0.108 0.135 0.113 0.137 0.140 0.275 regex skip
(anonymous groups work better)
pattern 'Python', string '-'*n+'Python'+'-'*n sre8 0.013 0.013 0.014 0.019 0.039 0.148 sre16 0.013 0.013 0.014 0.020 0.043 0.187 pre 0.129 0.105 0.109 0.117 0.191 0.277 regex 0.011 0.025 0.018 0.016 0.037 0.127
pattern 'Python', string 'P'*n+'Python'+'P'*n sre8 0.040 0.012 0.021 0.026 0.080 0.248 sre16 0.012 0.013 0.015 0.025 0.061 0.283 pre 0.110 0.148 0.153 0.338 0.925 4.355 regex 0.013 0.013 0.041 0.155 0.535 2.628
(as we saw in the string.find test, sre is very fast when there are lots of false matches)
pattern '.*Python', string '-'*n+'Python'+'-'*n sre8 0.016 0.017 0.026 0.067 0.217 1.039 sre16 0.016 0.017 0.026 0.067 0.218 1.076 pre 0.111 0.112 0.124 0.180 0.386 1.494 regex 0.015 0.022 0.073 0.408 1.669 8.489
pattern '.Python.', string '-'*n+'Python'+'-'*n sre8 0.016 0.017 0.030 0.089 0.315 1.499 sre16 0.016 0.018 0.032 0.090 0.314 1.537 pre 0.112 0.113 0.129 0.186 0.413 1.605 regex 0.016 0.023 0.076 0.387 1.674 8.519
pattern '.*(Python)', string '-'*n+'Python'+'-'*n sre8 0.020 0.021 0.044 0.147 0.542 2.630 sre16 0.019 0.021 0.044 0.154 0.541 2.681 pre 0.115 0.117 0.141 0.245 0.636 2.690 regex 0.019 0.026 0.097 0.467 2.007 10.264
pattern '.*(?:Python)', string '-'*n+'Python'+'-'*n sre8 0.016 0.017 0.027 0.065 0.220 1.037 sre16 0.016 0.017 0.026 0.070 0.221 1.066 pre 0.112 0.119 0.136 0.223 0.566 2.377 regex skip
pattern 'Python|Perl|Tcl', string '-'*n+'Perl'+'-'*n sre8 0.013 0.015 0.034 0.114 0.407 1.985 sre16 0.014 0.016 0.034 0.109 0.392 1.915 pre 0.107 0.108 0.117 0.124 0.167 0.393 regex 0.012 0.012 0.013 0.017 0.033 0.123
(here's another sre compiler problem: it fails to realize that this pattern starts with characters from a given set [PT]. pre and regex both use bitmaps...)
pattern 'Python|Perl|Tcl', string 'P'*n+'Perl'+'P'*n sre8 0.013 0.018 0.055 0.228 0.847 4.165 sre16 0.015 0.027 0.055 0.218 0.821 4.061 pre 0.111 0.116 0.172 0.415 1.354 6.302 regex 0.011 0.019 0.085 0.374 1.467 7.261
(but when there are lots of false matches, sre is faster anyway. interesting...)
pattern '(Python|Perl|Tcl)', string '-'*n+'Perl'+'-'*n sre8 0.014 0.018 0.042 0.152 0.575 2.798 sre16 0.015 0.019 0.042 0.148 0.556 2.715 pre 0.112 0.111 0.116 0.129 0.172 0.408 regex 0.012 0.013 0.014 0.018 0.035 0.124
pattern '(?:Python|Perl|Tcl)', string '-'*n+'Perl'+'-'*n sre8 0.014 0.016 0.034 0.113 0.405 1.987 sre16 0.016 0.016 0.033 0.112 0.393 1.918 pre 0.109 0.109 0.112 0.128 0.177 0.397 regex skip
pattern '(Python)\1', string '-'*n+'PythonPython'+'-'*n sre8 0.014 0.018 0.030 0.096 0.342 1.673 sre16 0.015 0.016 0.031 0.094 0.330 1.625 pre 0.112 0.111 0.112 0.119 0.141 0.268 regex 0.011 0.012 0.013 0.017 0.033 0.123
pattern '(Python)\1', string 'P'*n+'PythonPython'+'P'*n sre8 0.013 0.016 0.035 0.111 0.411 1.976 sre16 0.015 0.016 0.034 0.112 0.416 1.992 pre 0.110 0.116 0.160 0.355 1.051 4.797 regex 0.011 0.017 0.047 0.200 0.737 3.680
pattern '([0a-z][a-z0-9]*,)+', string '-'*n+'a5,b7,c9,'+'-'*n sre8 0.084 0.091 0.143 0.371 1.160 6.165 sre16 0.086 0.090 0.142 0.470 1.258 7.827 pre 0.155 0.140 0.185 0.200 0.280 0.523 regex 0.018 0.018 0.020 0.024 0.137 0.240
(again, sre's lack of "fastmap" is rather costly)
pattern '(?:[0a-z][a-z0-9]*,)+', string '-'*n+'a5,b7,c9,'+'-'*n sre8 0.028 0.033 0.077 0.303 1.433 7.140 sre16 0.021 0.027 0.073 0.277 1.031 5.053 pre 0.131 0.131 0.174 0.183 0.227 0.461 regex skip
pattern '([a-z][a-z0-9]*,)+', string '-'*n+'a5,b7,c9,'+'-'*n sre8 0.032 0.038 0.083 0.288 1.109 5.404 sre16 0.033 0.038 0.083 0.292 1.035 5.802 pre 0.195 0.135 0.176 0.187 0.233 0.468 regex 0.018 0.018 0.019 0.023 0.041 0.131
pattern '(?:[a-z][a-z0-9]*,)+', string '-'*n+'a5,b7,c9,'+'-'*n sre8 0.022 0.025 0.067 0.302 1.011 8.245 sre16 0.021 0.026 0.066 0.302 1.103 5.372 pre 0.262 0.397 0.178 0.193 0.250 0.817 regex skip
pattern '.*P.*y.*t.*h.*o.n.', string '-'*n+'Python'+'-'*n sre8 0.021 0.084 0.118 0.251 0.965 5.414 sre16 0.021 0.025 0.063 0.366 1.192 4.639 pre 0.123 0.147 0.225 0.568 1.899 9.336 regex 0.028 0.060 0.258 1.269 5.497 28.334
- Previous message: [Python-Dev] Re: [Doc-SIG] Python HOWTO project created
- Next message: [Python-Dev] SRE 0.9.8 benchmarks
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]