[Python-Dev] sum(...) limitation (original) (raw)
Stephen J. Turnbull stephen at xemacs.org
Tue Aug 12 05:50:21 CEST 2014
- Previous message: [Python-Dev] sum(...) limitation
- Next message: [Python-Dev] sum(...) limitation
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Chris Barker - NOAA Federal writes:
Is there anything in the language spec that says string concatenation is O(n^2)? Or for that matter any of the performs characteristics of build in types? Those striker as implementation details that SHOULD be particular to the implementation.
Container concatenation isn't quadratic in Python at all. The naive implementation of sum() as a loop repeatedly calling add is quadratic for them. Strings (and immutable containers in general) are particularly horrible, as they don't have iadd.
You could argue that sum() being a function of an iterable isn't just a calling convention for a loop encapsulated in a function, but rather a completely different kind of function that doesn't imply anything about the implementation, and therefore that it should dispatch on type(it). But explicitly dispatching on type(x) is yucky (what if somebody wants to sum a different type not currently recognized by the sum() builtin?) so, obviously, we should define a standard sum dunder! IMO we'd also want a homogeneous_iterable ABC, and a concrete homogeneous_iterable_of_TYPE for each sum()-able TYPE to help users catch bugs injecting the wrong type into an iterable_of_TYPE.
But this still sucks. Why? Because obviously we'd want the attractive nuisance of "if you have add, there's a default definition of sum" (AIUI, this is what bothers Alexander most about the current situation, at least of the things he's mentioned, I can really sympathize with his dislike). And new Pythonistas and lazy programmers who only intend to use sum() on "small enough" iterables will use the default, and their programs will appear to hang on somewhat larger iterable, or a realtime requirement will go unsatisfied when least expected, or .... If we don't have that property for sum(), ugh! Yuck! Same old same old! (IMHO, YMMV of course)
It's possible that Python could provide some kind of feature that would allow an optimized sum function for every type that has add, but I think this will take a lot of thinking. Somebody will do it (I don't think anybody is +1 on restricting sum() to a subset of types with add). I just think we should wait until that somebody appears.
Should we cripple the performance of some operation in Cpython so that it won't work better that Jython?
Nobody is crippling operations. We're prohibiting use of a name for an operation that is associated (strongly so, in my mind) with an inefficient algorithm in favor of the same operation by a different name (which has no existing implementation, and therefore Python implementers are responsible for implementing it efficiently). Note: the "inefficient" algorithm isn't inefficient for integers, and it isn't inefficient for numbers in general (although it's inaccurate for some classes of numbers).
Seems the same argument [that Python language doesn't prohibit optimizations in particular implementations just because they aren't made in others] could be made for sum(list_of_strings).
It could. But then we have to consider special-casing every builtin type that provides add, and we impose an unobvious burden on user types that provide add.
It seems pretty pedantic to say: we could make this work well, but we'd rather chide you for not knowing the "proper" way to do it.
Nobody disagrees. But backward compatibility gets in the way.
But sum() is not inherently quadratic -- that's a limitation of the implementation.
But the faulty implementation is the canonical implementation, the only one that can be defined directly in terms of add, and it is efficient for non-container types.[1]
"".join could be naively written with the same poor performance -- why should users need to understand why one was optimized and one was not?
Good question. They shouldn't -- thus the prohibition on sum()ing strings.
That is a very import a lesson to learn, sure, but python is not only a teaching language. People will need to learn those lessons at some point, this one feature makes little difference.
No, it makes a big difference. If you can do something, then it's OK to do it, is something Python tries to implement. If sum() works for everything with an add, given current Python language features some people are going to end up with very inefficient code and it will bite some of them (and not necessarily the authors!) at some time.
If it doesn't work for every type with add, why not? You'll end up playing whack-a-mole with type prohibitions. Ugh.
Sure, but I think all that does is teach people about a cpython specific implementation -- and I doubt naive users get any closer to understanding algorithmic complexity -- all they learn is you should use string.join().
Oh well, not really that big a deal.
Not to Python. Maybe not to you. But I've learned a lot about Pythonic ways of doing things trying to channel the folks who implemented this restriction. (I don't claim to have gotten it right! Just that it's been fun and educational. :-)
Steve
Footnotes: [1] This isn't quite true. One can imagine a "token" or "symbol" type that is str without len, but does have add. But that seems silly enough to not be a problem in practice.
- Previous message: [Python-Dev] sum(...) limitation
- Next message: [Python-Dev] sum(...) limitation
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]