[Python-Dev] RFC: Add a new builtin strarray type to Python? (original) (raw)
Victor Stinner victor.stinner at haypocalc.com
Sat Oct 1 19:17:56 CEST 2011
- Previous message: [Python-Dev] What it takes to change a single keyword.
- Next message: [Python-Dev] RFC: Add a new builtin strarray type to Python?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
Since the integration of the PEP 393, str += str is not more super-fast (but just fast). For example, adding a single character to a string has to copy all characters to a new string. I suppose that performances of a lot of applications manipulating text may be affected by this issue, especially text templating libraries.
io.StringIO has also been changed to store characters as Py_UCS4 (4 bytes) instead of Py_UNICODE (2 or 4 bytes). This class doesn't benefit from the new PEP 393.
I propose to add a new builtin type to Python to improve both issues (cpu and memory): strarray. This type would have the same API than str, except:
- has append() and extend() methods
- methods results are strarray instead of str
I'm writing this email to ask you if this type solves a real issue, or if we can just prove the super-fast str.join(list of str).
--
strarray is similar to bytearray, but different: strarray('abc')[0] is 'a', not 97, and strarray can store any Unicode character (not only integers in range 0-255).
I wrote a quick and dirty implementation in Python just to be able to play with the API, and to have an idea of the quantity of work required to implement it:
https://bitbucket.org/haypo/misc/src/tip/python/strarray.py
(Some methods are untested: see the included TODO list.)
--
Implement strarray in C is not trivial and it would be easier to implement it in 3 steps:
(a) Use Py_UCS4 array (b) The array type depends on the content: best memory footprint, as the PEP 393 (c) Use strarray to implement a new io.StringIO
Or we can just stop after step (a).
--
strarray API has to be discussed.
Most bytearray methods return a new object in most cases. I don't understand why, it's not efficient. I don't know if we can do in-place operations for strarray methods having the same name than bytearray methods (which are not in-place methods).
str has some more methods that bytes and bytearary don't have, like format. We may do in-place operation for these methods.
Victor
- Previous message: [Python-Dev] What it takes to change a single keyword.
- Next message: [Python-Dev] RFC: Add a new builtin strarray type to Python?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]