[Numpy-discussion] tiny patch + Playing with strings and my own array descr (PyArray_STRING, PyArray_OBJECT). (original) (raw)

Travis Oliphant oliphant.travis at ieee.org
Tue Jun 20 05:24:34 EDT 2006


Matthieu Perrot wrote:

hi,

I need to handle strings shaped by a numpy array whose data own to a C structure. There is several possible answers to this problem : 1) use a numpy array of strings (PyArraySTRING) and so a (char *) object in C. It works as is, but you need to define a maximum size to your strings because your set of strings is contiguous in memory. 2) use a numpy array of objects (PyArrayOBJECT), and wrap each «C string» with a python object, using PyStringObject for example. Then our problem is that there is as wrapper as data element and I believe data can't be shared when your created PyStringObject using (char *) thanks to PyStringAsStringAndSize by example.

Now, I will expose a third way, which allow you to use no size-limited strings (as in solution 1.) and don't create wrappers before you really need it (on demand/access). First, for convenience, we will use in C, (char **) type to build an array of string pointers (as it was suggested in solution 2). Now, the game is to make it works with numpy API, and use it in python through a python array. Basically, I want a very similar behabiour than arrays of PyObject, where data are not contiguous, only their address are. So, the idea is to create a new array descr based on PyArrayOBJECT and change its getitem/setitem functions to deals with my own data. I exepected numpy to work with this convenient array descr, but it fails because PyArrayScalar (arrayobject.c) don't call descriptor getitem function (in PyArrayOBJECT case) but call 2 lines which have been copy/paste from the OBJECTgetitem function). Here my small patch is : replace (arrayobject.c:983-984): PyINCREF(*((PyObject **)data)); return *((PyObject **)data); by : return descr->f->getitem(data, base); I play a lot with my new numpy array after this change and noticed that a lot of uses works : This is an interesting solution. I was not considering it, though, and so I'm not surprised you have problems. You can register new types but basing them off of PyArray_OBJECT can be problematic because of the special-casing that is done in several places to manage reference counting.

You are supposed to register your own data-types and get your own typenumber. Then you can define all the functions for the entries as you wish.

Riding on the back of PyArray_OBJECT may work if you are clever, but it may fail mysteriously as well because of a reference count snafu.

Thanks for the tests and bug-reports. I have no problem changing the code as you suggest.

-Travis



More information about the NumPy-Discussion mailing list