[Python-Dev] Identifier API (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Sat Oct 8 16:54:06 CEST 2011
- Previous message: [Python-Dev] Disabling cyclic GC in timeit module
- Next message: [Python-Dev] Identifier API
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
In benchmarking PEP 393, I noticed that many UTF-8 decode calls originate from C code with static strings, in particular PyObject_CallMethod. Many of such calls already have been optimized to cache a string object, however, PyObject_CallMethod remains unoptimized since it requires a char*.
I find the ad-hoc approach of declaring and initializing variables inadequate, in particular since it is difficult to clean up all those string objects at interpreter shutdown.
I propose to add an explicit API to deal with such identifiers. With this API,
tmp = PyObject_CallMethod(result, "update", "O", other);
would be replaced with
PyObject *tmp;
Py_identifier(update);
...
tmp = PyObject_CallMethodId(result, &PyId_update, "O", other);
Py_identifier expands to a struct
typedef struct Py_Identifier { struct Py_Identifier next; const char string; PyObject *object; } Py_Identifier;
string will be initialized by the compiler, next and object on first use. The new API for that will be
PyObject* PyUnicode_FromId(Py_Identifier*); PyObject* PyObject_CallMethodId(PyObject*, Py_Identifier*, char*, ...); PyObject* PyObject_GetAttrId(PyObject*, Py_Identifier*); int PyObject_SetAttrId(PyObject*, Py_Identifier*, PyObject*); int PyObject_HasAttrId(PyObject*, Py_Identifier*);
I have micro-benchmarked this; for
import time d={} i=d.items() t=time.time() for _ in range(10**6): i | d print(time.time()-t)
I get a speed-up of 30% (notice that "i | d" invokes the above PyObject_CallMethod call).
Regards, Martin
- Previous message: [Python-Dev] Disabling cyclic GC in timeit module
- Next message: [Python-Dev] Identifier API
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]