Issue 3460: PyUnicode_Join could perhaps be simpler (original) (raw)

Created on 2008-07-28 19:04 by pitrou, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
strjoin3k.patch	pitrou,2008-07-29 12:50

Messages (5)
msg70367 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-07-28 19:04
In py3k, PyUnicode_Join inherits some complexity from the 2.x days. However, it seems some of the precautions taken there may not be needed anymore. Witness the following comment: /* Grrrr. A codec may be invoked to convert str objects to * Unicode, and so it's possible to call back into Python code * during PyUnicode_FromObject(), and so it's possible for a sick * codec to change the size of fseq (if seq is a list). Therefore * we have to keep refetching the size -- can't assume seqlen * is invariant. */ Perhaps it would also allow to preallocate the target buffer all at once (like bytes.join does) rather than resize it incrementally. Marc-Andre, what do you think?
msg70381 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-07-29 09:06
The comment gives a wrong impression: The problem is not (only) that a codec might by evil, it's the fact that a codec may well execute Python code and thus allow the list to be changed by other threads during the operation. Now, since in Python 3.x codecs are no longer being invoked, it is probably safe to assume that Python code is not being executed while PyUnicode_Join() is running, but please double-check. It's also wise to apply a sanity check at the end of the loop to check whether the sequence length has indeed not changed (as assert maybe).
msg70385 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-07-29 10:41
Well the potentially dangerous function would have been PyUnicode_FromObject, but in py3k it only accepts unicode instances (either exact or subclasses), and since we are only interested in the underlying buffer we can replace those calls with PyUnicode_Check. I'll work on a patch and keep you updated.
msg70388 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-07-29 12:50
Here is a patch. On my measurements it makes str.join() 30% to 50% faster on non-trivial input.
msg70863 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-08-07 21:51
I've committed the patch in r65583.

History
Date	User	Action	Args
2022-04-11 14:56:37	admin	set	github: 47710
2008-08-07 21:51:20	pitrou	set	status: open -> closedresolution: fixedmessages: +
2008-07-29 12:50:12	pitrou	set	files: + strjoin3k.patchkeywords: + patchmessages: +
2008-07-29 10:41:12	pitrou	set	assignee: pitroumessages: +
2008-07-29 09:06:28	lemburg	set	messages: +
2008-07-28 19:04:38	pitrou	create