[Python-Dev] Broken strptime in Python 2.3a1 & CV (original) (raw)

Brett Cannon bac@OCF.Berkeley.EDU
Tue, 14 Jan 2003 17:25:53 -0800 (PST)


[Tim Peters]

[Brett Cannon] > ... > And to comment on the speed drawback: there is already a partial solution > to this. strptime has the ability to return the regex it creates to > parse the data string and then subsequently have the user pass that in > instead of a format string::

You're carrying restructured text too far ::

=) Need the practice; giving a lightning tutorial on it at PyCon. But I will cut back on the literal markup.

I expect it would be better for strptime to maintain its own internal cache mapping format strings to compiled regexps (as a dict, indexed by format strings). Dict lookup is cheap. In most programs, this dict will remain empty. In most of the rest, it will have one entry. Some joker will feed it an unbounded number of distinct format strings, though, so blow the cache away if it gets "too big":

regexp = cache.get(fmtstring) if regexp is None: regexp = compiletheregexp(fmtstring) if len(cache) > 30: # whatever cache.clear() cache[fmtstring] = regexp Then you're robust against all comers (it's also thread-safe).

Hmm. Could do that. Could also cache the locale information that I discover (only one copy should be enough; don't think people swap between locales that often). Caching the object that stores locale info, called TimeRE (see, no markup; fast learner I am =), would speed up value calculations (have to compare against it to figure out what month it is, etc.) along with creating multiple regexes (since the locale info won't have to be recalculated). And then the cache that you are suggesting, Tim, would completely replace the need to be able to return regex objects. Spiffy. =)

OK, so, with the above-mentioned improvements I can rip out the returning of regex objects functionality. I am going to assume no one has any issue with this design idea, so I will do another patch for this (now I have one on SF dealing with a MacOS 9 issue, going to have one doing default values and making the %y directive work the way most people expect it to along with doc changes specifying that you can expect reliable behavior, and now a speed-up patch which will also remove my one use of the string module; fun =).

Now all I need is Alex to step in here and fiddle with Tim's code and then Christian and Raymond to come in and speed up the underlying C code for Tim's code that Alex touched and we will be in business. =)

sometimes-I-think-I-read-too-much-python-dev-mail-ly y'rs, Brett