Issue 1441884: A (treaded) Python crash only on dual core machines (original) (raw)
There is a strange freeze/crash only on dual core machines:
I have a python app (Python 2.3.5 /Pythonwin build 203 / Windows) running with no stability problems on normal machines (Or a crash is so rare, that absolutely nobody obverses it, though the overall majority of users uses single core machines). Threads, network & pythonwin/win32ui all in use.
Yet, from 3 users, all using a Dual Processor System (XEON, amd x2 3800+) computer, I have reports, that the application freezes hard and/or crashes with a kind of random stack dump (operating system). I cannot experiment with those machines.
I found no hints other than:
http://groups.google.de/group/comp.lang.python/browse_frm/thread/64ca033e1a7f6c61/719b147e870bd5e6
http://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=480325
.. both discussions remaining in uncertainty.
Are there (known) problems with Python/Pythonwin specifically for dual core's (py2.3.5 / pywin203) ?
What could I do to find the problem?
Robert
PS: there is very little C extension-code (SWIG) involved, yet I looked over that so often, I guess its save:
//
#include "stdafx.h" #include "commctrl.h" #include "ext.h"
BOOL APIENTRY DllMain( HANDLE hModule, DWORD ul_reason_for_call, LPVOID lpReserved ) { return TRUE; }
class CAllowThreads {
public:
PyThreadState *_save;
CAllowThreads() {
_save = PyEval_SaveThread();
}
~CAllowThreads() {
PyEval_RestoreThread(_save);
}
};
PyObject* PyListView_GetSubItemRect( HWND hwndLV, int iItem, int iSubItem, int code // LPRECT lpRect ) { RECT r; { CAllowThreads _t; ListView_GetSubItemRect( hwndLV, iItem, iSubItem, code, &r ); } return Py_BuildValue("iiii", r.left,r.top,r.right,r.bottom);
}
int GetStringAddr(const char* s) { return (int)s; }
int PlaySoundResource(int resid, HMODULE hmod) { CAllowThreads _t; return PlaySound(MAKEINTRESOURCE(resid), hmod, SND_RESOURCE); }
int PlaySoundFile(const char* fname, int flag) { CAllowThreads _t; return PlaySound(fname, NULL, flag); }
PyObject* py_ToolTipRelayMsg( PyObject* self, PyObject* args ) { MSG msg; HWND hwTT;
if(!PyArg_ParseTuple(args,"i(iiiii(ii)):ToolTipRelayMsg", &hwTT,
&msg.hwnd,&msg.message,&msg.wParam,&msg.lParam,&msg.time, &msg.pt, ((int*)&msg.pt)+1) ) return NULL;
{
CAllowThreads _t;
SendMessage(hwTT,TTM_RELAYEVENT,0,(LPARAM)&msg);
}
Py_INCREF( Py_None );
return Py_None;
}
"GetStringAddress" is used only once like this (leades to correct NUL termination I think):
self.sb.SendMessage(commctrl.SB_SETTEXT,iPane,extension.GetStringAddr(text))
--- swig: static PyObject *_wrap_GetStringAddr(PyObject *self, PyObject *args) { PyObject *resultobj; char *arg0 ; int result ;
if(!PyArg_ParseTuple(args,(char
*)"s:GetStringAddr",&arg0)) return NULL; result = (int )GetStringAddr((char const *)arg0); resultobj = PyInt_FromLong((long)result); return resultobj; }
Logged In: YES user_id=972995
Find no indications, that it is a GIL problem. Whenever there was a GIL problem in past, I quickly got problems on any machine, not specifically only on dual cores. (The thread-state handling of above short extension code is done in the same style as in PythonWin for critical functions, which might come back as new messages in the message handler. This should be ok. Otherwise there is only normal Python/Pythonwin code).
The python part of the soft is too big, and I didn't manage to isolate the problem further as I have only a handful of user reports like this: """ Yes here is one popup window.
app.exe - Application Error
The instruction at "0x73dd11c7" referenced memory at "0x00000004". The memory could not be "read". Click on OK to terminate the program
OK
"""
""" AppName: app.exe AppVer: 2.7.0.0 ModName: python23.dll ModVer: 0.0.0.0 Offset: 00084ade
Followed by an application error window.
app.exe - Application Error
The instruction at "0x73dd11c7" referenced memory at "0x00000004". The memory could not be "read". Click on OK to terminate the program
OK
"""
""" Here is another error just now after latest upgrade
Microsoft Visual C++ Runtime Library
Runtime Error!
Program: C:\bin\app.exe
This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information.
OK
"""
(The memory errors above are within mfc42.dll)
I use a maybe questionable practice (which I use in order to smoothen the GUI ): I start normal Python threads, which use themselfs win32-functions. I guess this is ok, as windows GUI functions are considered to be thread-safe.
A simplified example:
thread.start_new(win32ui.MessageBox, 'test')
Guess, this is ok?
Logged In: YES user_id=21627
Well, this bug here is clearly something different. An attempt to access 0x00000004 very obviously is a null pointer access, at a structure offset at offset 4. At offset 4, if this is a Python object (which it might not be), is the ob_type field of an object. So it might be that somebody tries to find out the Python type of a null pointer (which ought to crash).
This is different from 1442426, which apparently is not a null-pointer access.
Without any kind of backtrace, it is very hard to guess what the cause of this null-pointer access might be - it could be everywhere (including Python, PythonWin, and, last-but-not-least, your code).
So I'm closing this as "unreproducable" (for which SF only has "works for me").