Issue 1569291: Speed-up in array_repeat() (original) (raw)

Issue1569291

Created on 2006-10-02 13:30 by lskovlund, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
arraymodule.diff lskovlund,2006-10-02 13:30 Patch against current SVN
issue1569291.diff belopolsky,2008-03-24 19:33 Patch against revision 61850
Messages (15)
msg51182 - (view) Author: Lars Skovlund (lskovlund) Date: 2006-10-02 13:30
Use iterative doubling when extending the old array. This results in O(log n) calls to memcpy() instead of O(n).
msg51183 - (view) Author: Josiah Carlson (josiahcarlson) * (Python triager) Date: 2006-10-07 16:39
Logged In: YES user_id=341410 Have you benchmarked this for repeats found "in the wild" to establish *observable* speedup for code that already exists?
msg51184 - (view) Author: Lars Skovlund (lskovlund) Date: 2006-10-07 22:12
Logged In: YES user_id=263372 I wrote this code for a university project I'm doing myself, which involves initializing a *very* large array (it's a remote memory system). It does help there, certainly; for medium-sized arrays, the improvement would be negligable, and for small ones it might even be worse. If you mean, have I benchmarked it with other people's code, no. I just thought I'd offer it to the community, since it has certainly helped me.
msg51185 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-10-08 00:59
Logged In: YES user_id=80475 This patch is basically fine. Will neaten it up to match our source coding conventions and apply it when I get a chance. Py2.6 won't be out for a good while, so there is no hurry.
msg51186 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2006-10-13 08:17
Logged In: YES user_id=364875 I'd assumed Python *didn't* double the size when resizing an array because of how much memory that wastes. May I suggest trying it with a multiplier of 1.5x, and comparing both CPU time and memory consumption? I find for these things that 1.5x is nearly as fast and wastes far less memory.
msg51187 - (view) Author: Josiah Carlson (josiahcarlson) * (Python triager) Date: 2006-10-13 15:26
Logged In: YES user_id=341410 This patch has nothing to do with array resizing. It has to do with array initialization.
msg51188 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2007-04-03 02:50
This proposal is basically fine. The patch should be re-worked to model as closely as possible the code for string_repeat in Objects/stringobject.c (revisions 30616 and 30368 provide the details). That code is known to work and to have provided a meaningful speed-up.
msg64405 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-03-24 09:02
This one is easy if someone wants to take a crack at it.
msg64429 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-03-24 18:42
Looking at stringobject.c:1034: i = 0; if (i < size) { Py_MEMCPY(op->ob_sval, a->ob_sval, Py_SIZE(a)); i = Py_SIZE(a); } while (i < size) { j = (i <= size-i) ? i : size-i; Py_MEMCPY(op->ob_sval+i, op->ob_sval, j); i += j; } return (PyObject *) op; Do I miss something or the first condition "if (i < size)" is a fancy way to check for size > 0? Wouldn't it be clearer to write if (size == 0) return (PyObject *) op; Py_MEMCPY(op->ob_sval, a->ob_sval, Py_SIZE(a)); i = Py_SIZE(a); ..
msg64433 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-03-24 19:33
Attached patch (.diff) reimplements the optimization by following Objects/stringobject.c logic as Raymond suggested. I see two remaining issues which in my view should be addressed separately: 1. There is no check for overflow after the multiplication computing the size of the resulting array. Note that while newarrayobject checks that size*itemsize does not overflow internally, there is no such check for Py_SIZE(a) * n. A decision needs to be made whether Overflow or NoMemory error should be raised because currently string_repeat and newarrayobject treat overflow differently. 2. See above. I think both string and (patched) array codes can be simplified.
msg64435 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-03-24 19:35
I forgot to mention that I added a unit test to cover the special case of repeating a single-byte array.
msg68096 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-06-12 21:42
Georg, do you want to take it from here.
msg110500 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-16 21:20
There are lots of positive comments on this issue, please can core developers take a bit of time to have a look and say yes or no.
msg110501 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-07-16 21:21
Uh, this slipped under my radar. Let's see if I get the chance to look at it at the core sprint next week.
msg123334 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-12-04 11:02
I changed the patch to look more like unicode_repeat (which addresses Alex' point #2) and committed in r87022.
History
Date User Action Args
2022-04-11 14:56:20 admin set github: 44066
2010-12-04 11:02:31 georg.brandl set status: open -> closedresolution: acceptedmessages: +
2010-09-01 21:39:58 stutzbach set nosy: + stutzbach
2010-07-16 21:21:52 georg.brandl set messages: +
2010-07-16 21:20:21 BreamoreBoy set nosy: + BreamoreBoymessages: + versions: + Python 2.7, Python 3.2, - Python 2.6
2009-05-12 12:46:34 ajaksu2 set stage: patch reviewversions: + Python 3.1
2008-06-12 21:42:08 rhettinger set assignee: rhettinger -> georg.brandlmessages: + nosy: + georg.brandl
2008-03-24 19:35:06 belopolsky set messages: +
2008-03-24 19:33:39 belopolsky set files: + issue1569291.difftype: performancemessages: + keywords: + patch
2008-03-24 18:42:51 belopolsky set nosy: + belopolskymessages: +
2008-03-24 09:02:39 rhettinger set keywords: + easy, - patchmessages: +
2006-10-02 13:30:28 lskovlund create