Issue 4679: Fork + shelve causes shelve corruption and backtrace (original) (raw)

Hi,

I wrote a simple script (attached) to do some preprocessing of MediaWiki XML dumps. When it has a 8 MB chunk ready to dump to disk, it forks, and the child writes it out and (will) compress it, then exit. The main thread continues as before. Note that the child thread never touches (or executes code that has in scope) the shelve handle.

The attached script, as written, will work fine on dumps (I tested it on enwikisource-20081112-pages-articles.xml available from http://download.wikimedia.org/enwikisource/20081112/). If you uncomment the fork on line 40 (and the exit() on line 46 of course) and run it, it will die after writing out about 450 megabytes with the backtrace below.

This appears to happen deterministically at the same place 3 of the 3 times I ran it. Apologies for the size and complexity of the test, I don't have time to reduce it further at the moment, and it looks like it may be fairly involved. I can try to work out a reduced case later and resubmit if no one wants to touch this as is;)

I ran the script with:

bzcat enwikisource-20081112-pages-articles.xml.bz2 | ./convert.py wikisource 8388608

(after making a dir called wikisource)

Let me know if I can be of any assistance, and apologies if this is somewhere documented and I missed it.

Using Python 2.6.1 as released from python.org.

Alex

alexr@autumn:~/projects/wikipedia$ cat enwikisource-20081112-pages-articles.xml | ./convert.py wikisource 8388608 Alexandria version 1, Copyright (C) 2008 Alex Roper Alexandria comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to copy modify, and redistribute it under certain conditions; see the file COPYING for details. ..........................................................Traceback (most recent call last): File "./convert.py", line 100, in sax.parse(sys.stdin, Parser(sys.argv[1], MIN_CHUNK_SIZE)) File "/usr/lib/python2.6/xml/sax/init.py", line 33, in parse parser.parse(source) File "/usr/lib/python2.6/xml/sax/expatreader.py", line 107, in parse xmlreader.IncrementalParser.parse(self, source) File "/usr/lib/python2.6/xml/sax/xmlreader.py", line 123, in parse self.feed(buffer) File "/usr/lib/python2.6/xml/sax/expatreader.py", line 207, in feed self._parser.Parse(data, isFinal) File "/usr/lib/python2.6/xml/sax/expatreader.py", line 304, in end_element self._cont_handler.endElement(name) File "./convert.py", line 61, in endElement s.pagehandler(s.title, s.text) File "./convert.py", line 68, in pagehandler s.index[title.encode("UTF8")] = (s.chunks, len(s.pages)) File "/usr/lib/python2.6/shelve.py", line 133, in setitem self.dict[key] = f.getvalue() File "/usr/lib/python2.6/bsddb/init.py", line 276, in setitem _DeadlockWrap(wrapF) # self.db[key] = value File "/usr/lib/python2.6/bsddb/dbutils.py", line 68, in DeadlockWrap return function(*_args, **_kwargs) File "/usr/lib/python2.6/bsddb/init.py", line 275, in wrapF self.db[key] = value bsddb.db.DBRunRecoveryError: (-30975, 'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: Invalid argument') Exception bsddb.db.DBRunRecoveryError: DBRunRecoveryError(-30975, 'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal region error detected; run recovery') in <bound method Parser.__del__ of <__main__.Parser instance at 0x7f3492966d40>> ignored Exception bsddb.db.DBRunRecoveryError: DBRunRecoveryError(-30975, 'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal region error detected; run recovery') in ignored Exception bsddb.db.DBRunRecoveryError: DBRunRecoveryError(-30975, 'DB_RUNRECOVERY: Fatal error, run database recovery -- PANIC: fatal region error detected; run recovery') in ignored

I've just been using the sq_dict module, which is a drop-in replacement for shelve written using sqlite3. BDB is a pretty squirraly piece of software in my experience. It may or may not be stable on it's own, but its APIs are pretty poorly documented and programmers tend to misuse them without knowing it.

Every job I've done with it has involved major hacks such as API interception and replacement with sqlite3, cronjobs to rebuild hte database every hour, etc. It's also nice to have databases that are platform independent, and in all the applications I use the slight slowdown for sqlite is acceptable (I mean I /am/ using Python)

YMMV of course. Also I know at one point Python 3 was going to use sqlite. The sq_dict I mention is on Bugzilla somewhere, or email me if you need a copy.

Alex

Alexander Belopolsky wrote:

Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:

The wikisource file in the report is no longer available, but with latest wikisource and python 2.7,

$ curl http://download.wikimedia.org/enwikisource/latest/enwikisource-latest-pages-articles.xml.bz2| bzip2 -cd | ./python.exe convert.py /tmp 8388608

went through first 50MiB without an error. I am not sure I'll have the patience to run this to completion, but it looks like this is out of date.


nosy: +belopolsky resolution: -> out of date stage: -> unit test needed status: open -> pending


Python tracker <report@bugs.python.org> <http://bugs.python.org/issue4679>