[Python-Dev] Investigating time for import requests (original) (raw)

Paul Moore p.f.moore at gmail.com
Mon Oct 2 04:57:01 EDT 2017


On 2 October 2017 at 06:13, Raymond Hettinger <raymond.hettinger at gmail.com> wrote:

On Oct 1, 2017, at 7:34 PM, Nathaniel Smith <njs at pobox.com> wrote:

In principle re.compile() itself could be made lazy -- return a regular exception object that just holds the string, and then compiles and caches it the first time it's used. Might be tricky to do in a backwards compatibility way if it moves detection of invalid regexes from compile time to use time, but it could be an opt-in flag. ISTM that someone writing re.compile(pattern) is explicitly saying they want the regex to be pre-compiled. For cache on first-use, we already have a way to do that with re.search(pattern, some string) which compiles and then caches.

In practice, I don't think the fact that re.search() et al cache the compiled expressions is that well known (it's mentioned in the re.compile docs, but not in the re.search docs) and so people often compile up front because they think it helps, rather than actually measuring to check. Also, many regexes are long and complex, so factoring them out as global variables is a reasonable practice. And it's easy to imagine people deciding that putting the re.compile step into the global, rather than having the global be a string that gets passed to re.search, is a sensible thing to do (I know I'd do that, without even thinking about it).

So I think that cache on first use is likely to be a useful optimisation in practical terms. I don't have any feel for how many uses of re.compile up front would be harmed if we defer compilation to first use (other than "probably not many") but we could make it opt-in if necessary - we'd hit the same problem of people not thinking to opt in, though.

Paul



More information about the Python-Dev mailing list