[Python-Dev] SRE 0.9.8 benchmarks (original) (raw)
M.-A. Lemburg mal@lemburg.com
Thu, 03 Aug 2000 15:31:34 +0200
- Previous message: [Python-Dev] SRE 0.9.8 benchmarks
- Next message: [Python-Dev] Fork on Win32 - was (test_fork1 failing...)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Fredrik Lundh wrote:
mal wrote: > Just for compares: would you mind running the search > routines in mxTextTools on the same machine ? > > searching for "spam" in a string padded with "spaz" (1000 bytes on > > each side of the target): > > > > string.find 0.112 ms texttools.find 0.080 ms > > sre8.search 0.059 > > pre.search 0.122 > > > > unicode.find 0.130 > > sre16.search 0.065 > > > > same test, without any false matches (padded with "-"): > > > > string.find 0.035 ms texttools.find 0.083 ms > > sre8.search 0.050 > > pre.search 0.116 > > > > unicode.find 0.031 > > sre16.search 0.055 > > Those results are probably due to the fact that string.find > does a brute force search. If it would do a last match char > first search or even Boyer-Moore (this only pays off for long > search targets) then it should be a lot faster than [s|p]re. does the TextTools algorithm work with arbitrary character set sizes, btw?
The find function creates a Boyer-Moore search object for the search string (on every call). It compares 1-1 or using a translation table which is applied to the searched text prior to comparing it to the search string (this enables things like case insensitive search and character sets, but is about 45% slower). Real-life usage would be to create the search objects once per process and then reuse them. The Boyer-Moore table calcuation takes some time...
But to answer your question: mxTextTools is 8-bit throughout. A Unicode aware version will follow by the end of this year.
Thanks for checking,
Marc-Andre Lemburg
Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
- Previous message: [Python-Dev] SRE 0.9.8 benchmarks
- Next message: [Python-Dev] Fork on Win32 - was (test_fork1 failing...)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]