[Python-Dev] Help with Unicode arrays in NumPy (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Tue Feb 7 21:06:28 CET 2006


Travis E. Oliphant wrote:

Currently that means that they are "unicode" strings of basic size UCS2 or UCS4 depending on the platform. It is this duality that has some people concerned. For all other data-types, NumPy allows the user to explicitly request a bit-width for the data-type.

Why is that a desirable property? Also: Why does have NumPy support for Unicode arrays in the first place?

Before embarking on this journey, however, we are seeking advice from individuals wiser to the way of Unicode on this list.

My initial reaction is: use whatever Python uses in "NumPy Unicode". Upon closer inspection, it is not all that clear what operations are supported on a Unicode array, and how these operations relate to the Python Unicode type.

In any case, I think NumPy should have only a single "Unicode array" type (please do explain why having zero of them is insufficient).

If the purpose of the type is to interoperate with a Python unicode object, it should use the same width (as this will allow for mempcy).

If the purpose is to support arbitrary Unicode characters, it should use 4 bytes (as two bytes are insufficient to represent arbitrary Unicode characters).

If the purpose is something else, please explain what the purpose is.

Regards, Martin



More information about the Python-Dev mailing list