[Python-Dev] Make re.compile faster (original) (raw)
INADA Naoki songofacandy at gmail.com
Mon Oct 2 23:29:40 EDT 2017
- Previous message (by thread): [Python-Dev] PEP 554 v3 (new interpreters module)
- Next message (by thread): [Python-Dev] Make re.compile faster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Before deferring re.compile, can we make it faster?
I profiled import string
and small optimization can make it 2x faster!
(but it's not backward compatible)
Before optimize:
import time: self [us] | cumulative | imported package import time: 2339 | 9623 | string
string module took about 2.3 ms to import.
I found:
- RegexFlag.and and new is called very often.
- _optimize_charset is slow, because re.UNICODE | re.IGNORECASE
diff --git a/Lib/sre_compile.py b/Lib/sre_compile.py index 144620c6d1..7c662247d4 100644 --- a/Lib/sre_compile.py +++ b/Lib/sre_compile.py @@ -582,7 +582,7 @@ def isstring(obj):
def _code(p, flags):
- flags = p.pattern.flags | flags
flags = int(p.pattern.flags) | int(flags) code = []
compile info block
diff --git a/Lib/string.py b/Lib/string.py index b46e60c38f..fedd92246d 100644 --- a/Lib/string.py +++ b/Lib/string.py @@ -81,7 +81,7 @@ class Template(metaclass=_TemplateMetaclass): delimiter = '$' idpattern = r'[_a-z][_a-z0-9]*' braceidpattern = None
- flags = _re.IGNORECASE
flags = _re.IGNORECASE | _re.ASCII
def init(self, template): self.template = template
patched: import time: 1191 | 8479 | string
Of course, this patch is not backward compatible. [a-z] doesn't match with 'ı' or 'ſ' anymore. But who cares?
(in sre_compile.py) # LATIN SMALL LETTER I, LATIN SMALL LETTER DOTLESS I (0x69, 0x131), # iı # LATIN SMALL LETTER S, LATIN SMALL LETTER LONG S (0x73, 0x17f), # sſ
There are some other re.I(GNORECASE)
options in stdlib. I'll check them.
More optimization can be done with implementing sre_parse and sre_compile in C. But I have no time for it in this year.
Regards,
Inada Naoki <songofacandy at gmail.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20171003/c3df69d4/attachment-0001.html>
- Previous message (by thread): [Python-Dev] PEP 554 v3 (new interpreters module)
- Next message (by thread): [Python-Dev] Make re.compile faster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]