[Python-3000] Droping find/rfind? (original) (raw)

Jean-Paul Calderone exarkun at divmod.com
Fri Aug 25 20:49:02 CEST 2006


On Fri, 25 Aug 2006 10:53:15 -0700, Guido van Rossum <guido at python.org> wrote:

On 8/25/06, Jean-Paul Calderone <exarkun at divmod.com> wrote:

>For the record, I think this is a major case of YAGNI. You appear way >to obsessed with performance of some microscopic aspect of the >language. Please stop firing random proposals until you actually have >working code and proof that it matters. Speeding up microbenchmarks is >irrelevant.

Twisted's core loop uses string views to avoid unnecessary copying. This has proven to be a real-world speedup. This isn't a synthetic benchmark or a micro-optimization. OK, that's the kind of data I was hoping for; if this was mentioned before I apologize. Did they implement this in C or in Python? Can you point us to the docs for their API?

One instance of this is an implementation detail which doesn't impact any application-level APIs:

http://twistedmatrix.com/trac/browser/trunk/twisted/internet/abstract.py?r=17451#L88

Another instance of this is implemented in C++:

http://twistedmatrix.com/trac/browser/sandbox/itamar/cppreactor/fusion

but doesn't interact a lot with Python code. The C++ API uses char* with a length (a natural way to implement string views in C/C++). The Python API just uses strings, because Twisted has always used str here, and passing in a buffer would break everything expecting something with str methods.

I don't understand the resistance. Is it really so earth-shatteringly surprising that not copying memory unnecessarily is faster than copying memory unnecessarily? It depends on how much bookkeeping is needed to properly free the underlying buffer when it is no longer referenced, and whether the application repeatedly takes short long-lived slices of long otherwise short-lived buffers. Unless you have a heuristic for deciding to copy at some point, you may waste a lot of space.

Certainly. The first link above includes an example of such a heuristic.

If the goal is to avoid speeding up Python programs because views are too complex or unpythonic or whatever, fine. But there isn't really any question as to whether or not this is a real optimization. There are many ways to implement views. It has often been proposed to make views an automatic feature of the basic string object. There the optimization in one case has to be weighed against the pessimization in another case (like the bookkeeping overhead everywhere and the worst-case scenario I mentioned above).

I'm happy to see things progress one step at a time. Having them at all (buffer) was a good place to start. A view which has string methods is a nice incremental improvement. Maybe somewhere down the line there can be a single type which magically knows how to behave optimally for all programs, but I'm not asking for that yet. ;)

If views have to be explicitly requested that may not be a problem because the app author will (hopefully) understand the issues. But even if it was just a standard library module, I would worry that many inexperienced programmers would complicate their code by using the string views module without real benefits. Sort of the way some folks have knee-jerk habits to write

def foo(x, None=None): if they use None anywhere in the body of the function. This should be done only as a last resort when real-life measurements have shown that foo() is a performance show-stopper.

I don't think we see people overusing buffer() in ways which damage readability now, and buffer is even a builtin. Tossing something off into a module somewhere shouldn't really be a problem. To most people who don't actually know what they're doing, the idea to optimize code by reducing memory copying usually just doesn't come up.

Jean-Paul



More information about the Python-3000 mailing list