[Python-Dev] Signals, threads, blocking C functions (original) (raw)
Nick Maclaren nmm1 at cus.cam.ac.uk
Tue Sep 5 11:07:12 CEST 2006
- Previous message: [Python-Dev] Cross-platform math functions?
- Next message: [Python-Dev] Signals, threads, blocking C functions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Adam Olsen" <rhamph at gmail.com> wrote:
On 9/4/06, Gustavo Carneiro <gjcarneiro at gmail.com> wrote:
> Now, we've had this API for a long time already (at least 2.5 > years). I'm pretty sure it works well enough on most *nix systems. > Event if it works 99% of the times, it's way better than failing > 100% of the times, which is what happens now with Python. Failing 99% of the time is as bad as failing 100% of the time, if your goal is to eliminate the short timeout on poll(). 1% is quite a lot, and it would probably have an annoying tendency to trigger repeatedly when the user does certain things (not reproducible by you of course).
That can make it a lot WORSE that repeated failure. At least with hard failures, you have some hope of tracking them down in a reasonable time. The problem with exception handling code that goes off very rarely, under non-reproducible circumstances, is that it is almost untestable and that bugs in it are positive nightmares. I have been inflicted with quite a large number in my time, and have a fairly good success rate, but the number of people who know the tricks is decreasing.
Consider the (real) case where an unpredictable process on a large server (64 CPUs) was failing about twice a week (detectably), with no indication of how many failures were giving wrong answers. We replaced dozens of DIMMs, took days of down time and got nowhere; it then went hard (i.e. one failure a day). After a week's total down time, with me spending 100% of my time on it and the vendor allocating an expert at high priority, we cracked it. We were very lucky to find it so fast.
I could give you other examples that were/are there years and decades later, because the pain threshhold never got high enough to dedicate the time (and the VERY few people with experience). I know of at least one such problem in generic TCP/IP (i.e. on Linux, IRIX, AIX and possibly Solaris) that has been there for decades and causes occasional failure in most networked applications/protocols.
Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1 at cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679
- Previous message: [Python-Dev] Cross-platform math functions?
- Next message: [Python-Dev] Signals, threads, blocking C functions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]