Issue 3460: PyUnicode_Join could perhaps be simpler (original) (raw)

Created on 2008-07-28 19:04 by pitrou, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
strjoin3k.patch pitrou,2008-07-29 12:50
Messages (5)
msg70367 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-07-28 19:04
In py3k, PyUnicode_Join inherits some complexity from the 2.x days. However, it seems some of the precautions taken there may not be needed anymore. Witness the following comment: /* Grrrr. A codec may be invoked to convert str objects to * Unicode, and so it's possible to call back into Python code * during PyUnicode_FromObject(), and so it's possible for a sick * codec to change the size of fseq (if seq is a list). Therefore * we have to keep refetching the size -- can't assume seqlen * is invariant. */ Perhaps it would also allow to preallocate the target buffer all at once (like bytes.join does) rather than resize it incrementally. Marc-Andre, what do you think?
msg70381 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-07-29 09:06
The comment gives a wrong impression: The problem is not (only) that a codec might by evil, it's the fact that a codec may well execute Python code and thus allow the list to be changed by other threads during the operation. Now, since in Python 3.x codecs are no longer being invoked, it is probably safe to assume that Python code is not being executed while PyUnicode_Join() is running, but please double-check. It's also wise to apply a sanity check at the end of the loop to check whether the sequence length has indeed not changed (as assert maybe).
msg70385 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-07-29 10:41
Well the potentially dangerous function would have been PyUnicode_FromObject, but in py3k it only accepts unicode instances (either exact or subclasses), and since we are only interested in the underlying buffer we can replace those calls with PyUnicode_Check. I'll work on a patch and keep you updated.
msg70388 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-07-29 12:50
Here is a patch. On my measurements it makes str.join() 30% to 50% faster on non-trivial input.
msg70863 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-08-07 21:51
I've committed the patch in r65583.
History
Date User Action Args
2022-04-11 14:56:37 admin set github: 47710
2008-08-07 21:51:20 pitrou set status: open -> closedresolution: fixedmessages: +
2008-07-29 12:50:12 pitrou set files: + strjoin3k.patchkeywords: + patchmessages: +
2008-07-29 10:41:12 pitrou set assignee: pitroumessages: +
2008-07-29 09:06:28 lemburg set messages: +
2008-07-28 19:04:38 pitrou create