[Numpy-discussion] tiny patch + Playing with strings and my own array descr (PyArray_STRING, PyArray_OBJECT). (original) (raw)

Matthieu Perrot perrot at shfj.cea.fr
Wed Jun 21 12:15:20 EDT 2006


Le Mardi 20 Juin 2006 11:24, Travis Oliphant a écrit :

Matthieu Perrot wrote: > hi, > > I need to handle strings shaped by a numpy array whose data own to a C > structure. There is several possible answers to this problem : > 1) use a numpy array of strings (PyArraySTRING) and so a (char *) > object in C. It works as is, but you need to define a maximum size to > your strings because your set of strings is contiguous in memory. > 2) use a numpy array of objects (PyArrayOBJECT), and wrap each «C > string» with a python object, using PyStringObject for example. Then our > problem is that there is as wrapper as data element and I believe data > can't be shared when your created PyStringObject using (char *) thanks to > PyStringAsStringAndSize by example. > > > Now, I will expose a third way, which allow you to use no size-limited > strings (as in solution 1.) and don't create wrappers before you really > need it (on demand/access). > > First, for convenience, we will use in C, (char **) type to build an > array of string pointers (as it was suggested in solution 2). Now, the > game is to make it works with numpy API, and use it in python through a > python array. Basically, I want a very similar behabiour than arrays of > PyObject, where data are not contiguous, only their address are. So, the > idea is to create a new array descr based on PyArrayOBJECT and change > its getitem/setitem functions to deals with my own data. > > I exepected numpy to work with this convenient array descr, but it fails > because PyArrayScalar (arrayobject.c) don't call descriptor getitem > function (in PyArrayOBJECT case) but call 2 lines which have been > copy/paste from the OBJECTgetitem function). Here my small patch is : > replace (arrayobject.c:983-984): > PyINCREF(*((PyObject **)data)); > return *((PyObject **)data); > by : > return descr->f->getitem(data, base); > > I play a lot with my new numpy array after this change and noticed that a > lot of uses works :

This is an interesting solution. I was not considering it, though, and so I'm not surprised you have problems. You can register new types but basing them off of PyArrayOBJECT can be problematic because of the special-casing that is done in several places to manage reference counting. You are supposed to register your own data-types and get your own typenumber. Then you can define all the functions for the entries as you wish. Riding on the back of PyArrayOBJECT may work if you are clever, but it may fail mysteriously as well because of a reference count snafu. Thanks for the tests and bug-reports. I have no problem changing the code as you suggest. -Travis

Thanks for applying my suggestions.

I think, you suggest this kind of declaration : PyArray_Descr *descr = PyArray_DescrNewFromType(PyArray_VOID); descr->f->getitem = (PyArray_GetItemFunc *) my_getitem; descr->f->setitem = (PyArray_SetItemFunc *) my_setitem; descr->elsize = sizeof(char *); PyArray_RegisterDataType(descr);

Without the last line, you are right it works and it follows the C-API way. But if I register this array descr, the typenumber is bigger than what PyTypeNum_ISFLEXIBLE function considers to be a flexible type. So the returned scalar object is badly-formed. Then, I get a segmentation fault later, because the created voidscalar has a null descr pointer.

Matthieu Perrot



More information about the NumPy-Discussion mailing list