[Python-Dev] Encoding of PyFrameObject members (original) (raw)
M.-A. Lemburg mal at egenix.com
Fri Feb 6 11:44:46 CET 2015
- Previous message: [Python-Dev] Encoding of PyFrameObject members
- Next message: [Python-Dev] Azure event hub network access
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 06.02.2015 00:27, Francis Giraldeau wrote:
I need to access frame members from within a signal handler for tracing purpose. My first attempt to access cofilename was like this (omitting error checking):
PyFrameObject *frame = PyEvalGetFrame(); PyObject *ob = PyUnicodeAsUTF8String(frame->fcode->cofilename) char *str = PyBytesAsString(ob) However, the function PyUnicodeAsUTF8String() calls PyObjectMalloc(), which is not reentrant. If the signal handler nest over PyObjectMalloc(), it causes a segfault, and it could also deadlock. Instead, I access members directly: char *str = PyUnicodeDATA(frame->fcode->cofilename); sizet len = PyUnicodeGETDATASIZE(frame->fcode->cofilename); Is it safe to assume that unicode objects cofilename and coname are always UTF-8 data for loaded code? I looked at the PyTokenizerFromString() and it seems to convert everything to UTF-8 upfront, and I would like to make sure this assumption is valid.
The macros won't work in all cases, as they don't pay attention to the different kinds used in the Unicode implementation.
I don't think there's any API you can use to extract the underlying data without going through PyObject_Malloc() at some point (you may be lucky if there already is a UTF-8 version available, but it's not guaranteed).
I guess your best bet is to write your own UTF-8 codec which then copies the data to a buffer that you can control. Have a look at Objects/stringlib/codecs.h: utf8_encode.
Alternatively, you can copy the data to a Py_UCS4 buffer which you allocate using code such as this (untested, adapted from the UTF-8 encoder):
Py_UCS4 *p;
enum PyUnicode_Kind repkind;
void *repdata;
Py_ssize_t repsize, k;
if (PyUnicode_READY(rep) < 0)
goto error;
repkind = PyUnicode_KIND(rep);
repdata = PyUnicode_DATA(rep);
repsize = PyUnicode_GET_LENGTH(rep);
p = malloc((repsize + 1) * sizeof(Py_UCS4));
for(k=0; k<repsize; k++) {
*p++ = PyUnicode_READ(repkind, repdata, k);
}
/* 0-terminate */
*p++ = 0;
...
free(p);
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Source (#1, Feb 06 2015)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
- Previous message: [Python-Dev] Encoding of PyFrameObject members
- Next message: [Python-Dev] Azure event hub network access
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]