Issue 1569291: Speed-up in array_repeat() (original) (raw)

Issue1569291

Created on 2006-10-02 13:30 by lskovlund, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
arraymodule.diff	lskovlund,2006-10-02 13:30	Patch against current SVN
issue1569291.diff	belopolsky,2008-03-24 19:33	Patch against revision 61850

Messages (15)
msg51182 - (view)	Author: Lars Skovlund (lskovlund)	Date: 2006-10-02 13:30
Use iterative doubling when extending the old array. This results in O(log n) calls to memcpy() instead of O(n).
msg51183 - (view)	Author: Josiah Carlson (josiahcarlson) *	Date: 2006-10-07 16:39
Logged In: YES user_id=341410 Have you benchmarked this for repeats found "in the wild" to establish observable speedup for code that already exists?
msg51184 - (view)	Author: Lars Skovlund (lskovlund)	Date: 2006-10-07 22:12
Logged In: YES user_id=263372 I wrote this code for a university project I'm doing myself, which involves initializing a very large array (it's a remote memory system). It does help there, certainly; for medium-sized arrays, the improvement would be negligable, and for small ones it might even be worse. If you mean, have I benchmarked it with other people's code, no. I just thought I'd offer it to the community, since it has certainly helped me.
msg51185 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2006-10-08 00:59
Logged In: YES user_id=80475 This patch is basically fine. Will neaten it up to match our source coding conventions and apply it when I get a chance. Py2.6 won't be out for a good while, so there is no hurry.
msg51186 - (view)	Author: Larry Hastings (larry) *	Date: 2006-10-13 08:17
Logged In: YES user_id=364875 I'd assumed Python didn't double the size when resizing an array because of how much memory that wastes. May I suggest trying it with a multiplier of 1.5x, and comparing both CPU time and memory consumption? I find for these things that 1.5x is nearly as fast and wastes far less memory.
msg51187 - (view)	Author: Josiah Carlson (josiahcarlson) *	Date: 2006-10-13 15:26
Logged In: YES user_id=341410 This patch has nothing to do with array resizing. It has to do with array initialization.
msg51188 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2007-04-03 02:50
This proposal is basically fine. The patch should be re-worked to model as closely as possible the code for string_repeat in Objects/stringobject.c (revisions 30616 and 30368 provide the details). That code is known to work and to have provided a meaningful speed-up.
msg64405 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-03-24 09:02
This one is easy if someone wants to take a crack at it.
msg64429 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2008-03-24 18:42
Looking at stringobject.c:1034: i = 0; if (i < size) { Py_MEMCPY(op->ob_sval, a->ob_sval, Py_SIZE(a)); i = Py_SIZE(a); } while (i < size) { j = (i <= size-i) ? i : size-i; Py_MEMCPY(op->ob_sval+i, op->ob_sval, j); i += j; } return (PyObject ) op; Do I miss something or the first condition "if (i < size)" is a fancy way to check for size > 0? Wouldn't it be clearer to write if (size == 0) return (PyObject ) op; Py_MEMCPY(op->ob_sval, a->ob_sval, Py_SIZE(a)); i = Py_SIZE(a); ..
msg64433 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2008-03-24 19:33
Attached patch (.diff) reimplements the optimization by following Objects/stringobject.c logic as Raymond suggested. I see two remaining issues which in my view should be addressed separately: 1. There is no check for overflow after the multiplication computing the size of the resulting array. Note that while newarrayobject checks that sizeitemsize does not overflow internally, there is no such check for Py_SIZE(a) n. A decision needs to be made whether Overflow or NoMemory error should be raised because currently string_repeat and newarrayobject treat overflow differently. 2. See above. I think both string and (patched) array codes can be simplified.
msg64435 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2008-03-24 19:35
I forgot to mention that I added a unit test to cover the special case of repeating a single-byte array.
msg68096 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-06-12 21:42
Georg, do you want to take it from here.
msg110500 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2010-07-16 21:20
There are lots of positive comments on this issue, please can core developers take a bit of time to have a look and say yes or no.
msg110501 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2010-07-16 21:21
Uh, this slipped under my radar. Let's see if I get the chance to look at it at the core sprint next week.
msg123334 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2010-12-04 11:02
I changed the patch to look more like unicode_repeat (which addresses Alex' point #2) and committed in r87022.

History
Date	User	Action	Args
2022-04-11 14:56:20	admin	set	github: 44066
2010-12-04 11:02:31	georg.brandl	set	status: open -> closedresolution: acceptedmessages: +
2010-09-01 21:39:58	stutzbach	set	nosy: + stutzbach
2010-07-16 21:21:52	georg.brandl	set	messages: +
2010-07-16 21:20:21	BreamoreBoy	set	nosy: + BreamoreBoymessages: + versions: + Python 2.7, Python 3.2, - Python 2.6
2009-05-12 12:46:34	ajaksu2	set	stage: patch reviewversions: + Python 3.1
2008-06-12 21:42:08	rhettinger	set	assignee: rhettinger -> georg.brandlmessages: + nosy: + georg.brandl
2008-03-24 19:35:06	belopolsky	set	messages: +
2008-03-24 19:33:39	belopolsky	set	files: + issue1569291.difftype: performancemessages: + keywords: + patch
2008-03-24 18:42:51	belopolsky	set	nosy: + belopolskymessages: +
2008-03-24 09:02:39	rhettinger	set	keywords: + easy, - patchmessages: +
2006-10-02 13:30:28	lskovlund	create