msg70365 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-07-28 18:51 |
When the separator is empty and the sequence only contains one non-empty item, it is possible to optimize b"".join([...]) by simply returning the non-empty item. Since b"".join() is the recommended idiom to join large strings together, I think it can be useful. I encountered this when trying to optimize io.BufferedReader. ./python -m timeit -s "l=[b'', b'a' * 10000]" "b''.join(l)" before the patch: 100000 loops, best of 3: 2.35 usec per loop after the patch: 1000000 loops, best of 3: 0.335 usec per loop |
|
|
msg70389 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-07-29 13:00 |
Raymond, Martin, any opinion on this? |
|
|
msg70605 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2008-08-01 23:20 |
How much does the optimization speed up (or slow down?) a more normal case when it is applicable? bl = [b'', b'a'] How much does the optimization slow down the normal case where it is not applied? bl = [b'a']*2 bl = [b'a']*10000 Could not the .join user simply not add empty list items? Looking at the code, there appear to be 4 extra operations for every item in the normal case: assign item_size, test item_size, assign nonempty, increment nb_nonempty. I believe this alternative might be generally faster. Before the normal scan, if seplen == 0: for item in seq: : break else: Then normal cases will bail out on the second item and continue without further impact. |
|
|
msg70607 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-08-01 23:44 |
bl = [b'', b'a'] gives a 20% speedup: before patch: 1000000 loops, best of 3: 0.443 usec per loop after patch: 1000000 loops, best of 3: 0.349 usec per loop (20% speedup) bl = [b'a']*2 gives a 2% slowdown: before patch: 1000000 loops, best of 3: 0.439 usec per loop after patch: 1000000 loops, best of 3: 0.447 usec per loop bl = [b'a']*10000 gives a 1% slowdown: before patch: 1000 loops, best of 3: 510 usec per loop after patch: 1000 loops, best of 3: 515 usec per loop So, even in the worst case of joining the shortest possible non-empty strings, the overhead is still minimal. > Could not the .join user simply not add empty list items? The .join user could also detect whether there is a single element in the list and the separator is empty, and then avoid calling join :) The point of integrating the optimization in bytes.join is that: 1) it makes it implicit, so that user code stays clean of "performance hacks" 2) the optimization itself has much less overhead, since it is written in C rather than in Python |
|
|
msg70635 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-08-02 17:22 |
I dislike this change, not because of the potential slowdown, but because of the added code complexity. How plausible is it that you want to join a list of bytes objects, and the list has more than one item, yet all but one item are empty? If you have such an application, and performance matters, you shouldn't add the empty bytes to the list in the first place. Tentatively rejecting the patch. |
|
|
msg70636 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-08-02 18:57 |
Well as I said it occurred to me when doing some measurements of BufferedReader performance, but I agree that the gain may not justify the complexity. If nobody objects, feel free to close the issue. |
|
|
msg70642 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2008-08-02 20:36 |
>bl = [b'a']*2 gives a 2% slowdown: >bl = [b'', b'a'] gives a 20% speedup: If the second case is less than 9% of cases, which I expect is true, the change would cause an average slowdown ;-) >I encountered this when trying to optimize io.BufferedReader. Is there something peculiar about that code (and potentially fixable) that it tries to join lots of null strings? >I dislike this change... because of the added code complexity. That was part of my reason for suggesting the new code be separated from the old, but it would not much change the calculation above. So I agree on closing this. |
|
|