[C++-sig] boost::python::str and Python's str and unicode types (original) (raw)

Haoyu Bai divinekid at gmail.com
Tue Aug 4 17:23:55 CEST 2009

Previous message: [C++-sig] boost::python::str and Python's str and unicode types
Next message: [C++-sig] Details of Boost.Python Py_Finalize issue?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Aug 4, 2009 at 4:37 PM, Robert Smallshire<Robert.Smallshire at roxar.com> wrote:

On Tue, Jul 28, 2009 at 10:11 PM, Robert Smallshire<Robert.Smallshire at roxar.com> wrote: I have modified my local build of boost.python to include a boost::python::unicode class, together with appropriate conversions from wchart, const wchart* and std::wstring... During testing we have encountered issues with the difference in size of wchart and PyUNICODE. Windows : sizeof(wchart) == sizeof(PyUNICODE) == 2 Linux : sizeof(wchart) == 4 != sizeof(PyUNICODE) == 2 assuming a UCS-2 build of Python which is the default. If Python is built with UCS-4 support then I believe PyUNICODE and wchart will become compatible on Linux, but I'm not sure what the implications are for compatibility of Unicode string pickles, for example, between UCS-2 and UCS-4 builds of Python. Unfortunately, extract<const wchart*> seems to be problematic to implement in a portable manner because of these size differences. I have identified the following options: 1) Don't support extract<const wchart*> at all. There are no portability problems, but we have reduced functionality and break the symmetry between boost::python::str and boost::python::unicode behaviour. 2) Only support extract<const wchart*> on platforms where sizeof(wchart) == sizeof(PyUNICODE) where the PyUnicodeAsUnicode function can be used to return a pointer to Python's internal buffer. This has the API usability advantage of being symmetrical with how extract<const char*> works in boost.python today on platforms that support it. However, this makes writing portable code for clients awkward. This is what my current implementation does, and its broken on Linux. 3) Implement extract<const wchart*> such that it always copies the data from the PyUNICODE buffer into a new wchart buffer using PyUnicodeAsWideChar under the hood. The caller is then responsible for managing the lifetime of the buffer using delete [] or boost::sharedarray. This is how the extractstd::wstring is implemented which works without difficulty. However, this breaks the symmetry with extract<const char*> is a non-obvious way that would need to be prominently documented. I suggest this approach would be likely to lead to quite leaky usage of the API by unwary clients, especially when porting code to Unicode strings. 4) #ifdef between (2) and (3) above depending on whether sizeof(wchart) == sizeof(PyUNICODE). Combines all the bad characteristics of the above. There may, of course, be other options. If the data needs to be copied into a new buffer of wchart, the lifetime of which needs to be managed by the client, that pretty much describes the raison d'être of std::wstring, so my current preference is for option (1). If we did this, we'd still be able to construct boost::python::unicode instances from const wchart*, but would only be able to extract them as std::wstring. I'm open to persuasion about the right way forward... Thanks in advance for any comments or suggestions, and also to the people who have expressed interest in these patches off list. Regards, Rob Smallshire Roxar Software Solutions

DISCLAIMER: This message contains information that may be privileged or confidential and is the property of the Roxar Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorised to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.

Cplusplus-sig mailing list Cplusplus-sig at python.org http://mail.python.org/mailman/listinfo/cplusplus-sig

There could be an option similar to your 3) but still keep the memory managed at the Python side. The trick is, there is a "PyObject *defenc" field in the PyUnicodeObject struct, which can be seen as an internal object attached to and managed by PyUnicode. This field is being used as an object of cached UTF-8 encoded PyString of the PyUnicode object, by some Python API. For example, by PyUnicode_AsString (_PyUnicode_AsString in Python 3, it changed to a internal API). Thus, this object is managed by the PyUnicode object and will be destroyed when the PyUnicode destroyed.

So we may hack this field to inject an object which storing a wchar_t* and meanwhile managed by Python. This can be implemented by inherit PyString with an additional field. But, eh, this sounds a bit crazy. :p

Anyway, for pointers like const wchar_t *, boost::python requires a "lvalue converter", but when we create new object in the converter, it no longer actually a lvalue converter. That would be a bit strange.

So I think at now just have a unicode implementation without const wchar_t * converter is ok, as your option 1). We may have it implemented in future.

Just some my thoughts.

Regards, Haoyu Bai School of Computing, National University of Singapore.

Previous message: [C++-sig] boost::python::str and Python's str and unicode types
Next message: [C++-sig] Details of Boost.Python Py_Finalize issue?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Cplusplus-sig mailing list