[Python-ideas] Regular expression algorithms (original) (raw)
Josiah Carlson jcarlson at uci.edu
Thu Apr 12 18:04:57 CEST 2007
- Previous message: [Python-ideas] Regular expression algorithms
- Next message: [Python-ideas] Regular expression algorithms
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Adam Atlas <adam at atlas.st> wrote:
Has anyone seen this article? http://swtch.com/~rsc/regexp/regexp1.html
Yes, it has been posted in the sourceforge tracker as a feature request, in python-dev, and now here.
Are its criticisms of Python's regex algorithm accurate?
In the worst-case, yes, Python's regular expression runs in O(2^n) time (where n is the length of the string you are searching). However, as stated in the sourceforge entry, and has been stated in other places, one has to write a fairly useless regular expression to get it into the O(2^n) running time. For many cases, Python's regular expression engine is quite competitive with the Thompson NFA.
If so, might it be possible to revise Python's
re
module to use this sort of algorithm? I noticed it says that this approach doesn't work if the pattern contains backreferences, but maybe the module could at least sort of self-optimize by switching to this method when no backrefs are used.
It is possible, but only if someone takes the time to offer a patch. One thing to remember is that as stated in the documentation for Python's re module, certain operators are "greedy", that is, a* will pick up as many a's as it possibly can. Where as a base Thompson NFA will move on to the next state as early as possible, making a* with Thompson analagous to a*? in the Python (and others') regular expression engine. Yes, the Thompson NFA can be modified to also be greedy, but that is a particular characteristic of Python's engine that a Thompson NFA based engine will have to emulate (along with groups, named references, etc., which are a PITA for non-recursive engines).
- Josiah
- Previous message: [Python-ideas] Regular expression algorithms
- Next message: [Python-ideas] Regular expression algorithms
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]