[Python-Dev] Help with Unicode arrays in NumPy (original) (raw)

Travis E. Oliphant oliphant.travis at ieee.org
Tue Feb 7 20:52:21 CET 2006


This is a design question which is why I'm posting here. Recently the NumPy developers have become more aware of the difference between UCS2 and UCS4 builds of Python. NumPy arrays can be of Unicode type. In other words a NumPy array can be made of up fixed-data-length unicode strings.

Currently that means that they are "unicode" strings of basic size UCS2 or UCS4 depending on the platform. It is this duality that has some people concerned. For all other data-types, NumPy allows the user to explicitly request a bit-width for the data-type.

So, we are thinking of introducing another data-type to NumPy to differentiate between UCS2 and UCS4 unicode strings. (This also means a unicode scalar object, i.e. string of each of these, exactly one of which will inherit from the Python type).

Before embarking on this journey, however, we are seeking advice from individuals wiser to the way of Unicode on this list.

Perhaps all we need to do is be more careful on input and output of Unicode data-types so that transfer of unicode can be handled correctly on each platform.

Any thoughts?

-Travis Oliphant



More information about the Python-Dev mailing list