[Python-bugs-list] [ python-Bugs-445862 ] bsddb fails for larger amount of data (original) (raw)

noreply@sourceforge.net noreply@sourceforge.net
Thu, 18 Oct 2001 15:43:17 -0700


Bugs item #445862, was opened at 2001-07-30 00:21 You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=105470&aid=445862&group_id=5470

Category: Extension Modules Group: None

Status: Closed Resolution: Out of Date Priority: 5 Submitted By: Nobody/Anonymous (nobody) Assigned to: Barry Warsaw (bwarsaw) Summary: bsddb fails for larger amount of data

Initial Comment: The attached script fails after approx. 72500 insert operations. If you vary the size of the keys and/or the values, the bug occurs earlier or later, but even with a value size of 1 the bug will occur. Probably, this explains also bug #408271 ("crash in shelve module").

Platform: W2K


Comment By: Barry Warsaw (bwarsaw) Date: 2001-10-18 15:43

Message: Logged In: YES user_id=12800

I'm finally closing this bug report. With my Mandrake 7.2-ish system I get bsddb 3 by default and there seems to be no problem inserting several hundred thousand items (I killed it after 205k+). So it's very likely that the reported problems were due to the extremely old db 1.85.

Note though, that it should be okay license-wise if we wanted to distribute the latest Sleepycat library (currently 3.3.11) with Python. I've had personal conversations with the Sleepycat guys, and they've said that their read on their own license would allow this. We'd want to get it in writing first I think, and we'd have to ask about the binary-only Windows distros, but I think it would be okay. If we did this, we should also distribute Robin Dunn's excellent PyBSDDB3.

No time for this in Python 2.2, but let's look at it again for Python 2.3.


Comment By: Tim Peters (tim_one) Date: 2001-08-04 20:35

Message: Logged In: YES user_id=31435

Barry apparently forgot to assign this to himself; repaired his bug .


Comment By: Barry Warsaw (bwarsaw) Date: 2001-08-04 19:59

Message: Logged In: YES user_id=12800

if you can live with the licensing for sleepycat's db3, do yourself a huge favor and go to pybsddb.sf.net. robin dunn's got a very excellent, stable, new python binding, which i would like to integrate into the standard distro for the py2.2 release. it claims to support db1.85, although i've only tried it with a very recent v3.9.x.


Comment By: Skip Montanaro (montanaro) Date: 2001-08-04 19:42

Message: Logged In: YES user_id=44345

I don't know anything about the history, present, or prospects for bsddb -- like, is there a more recent unencumbered version we could use?

Ya got me. I've been using lib db 2 for quite awhile. They recently released lib db 3 (again, with file format incom- patibilities). I don't know the details of their license. It just comes with whatever version of Linux I happen to be running.

Saw this on the Sleepycat website:

The Berkeley DB 3.0 source code is available for download at no charge from Sleepycat Software's Web site, at www.sleepycat.com. It runs on all common versions of UNIX, and on Windows 95, Windows 98 and Windows NT. Berkeley DB is an Open Source product, and may be redistributed without charge in many circumstances. Licensing and pricing information are available from the company.

My guess would be that you can distribute lib db 3 with the binary version of Python. I am, as they say, "not a lawyer", so YMMV. For a definitive answer I think you'll have to ask Sleepycat.

Skip


Comment By: Tim Peters (tim_one) Date: 2001-08-04 17:02

Message: Logged In: YES user_id=31435

Skip, I reran the test after changing the open line to

db = bsddb.btopen("test.dbm", "n")

I killed it by hand at this point:

Last i: 326577, last key:abcdef4387101.63608

because Win98SE gets mondo unstable when it starts thrashing madly to disk, and it became impossible to get any work done while this was running.

I don't know anything about the history, present, or prospects for bsddb -- like, is there a more recent unencumbered version we could use? It looks like Sam's 1.85 Windows port is over 5 years old.


Comment By: Nobody/Anonymous (nobody) Date: 2001-08-04 15:59

Message: Logged In: NO

According to www.sleepycat.com/historic.html, talking about bsd db: "we recommend that you avoid the following operations when using versions 1.85 and 1.86:

o Btree cursor (seq and put using a cursor) operations. o Large numbers of btree duplicates (specifically, avoid migrating duplicate keys to internal pages). o Large numbers of btree deletes (you should periodically dump and rebuild the database if you delete large numbers of records). o Overwriting or deleting overflow hash key/data pairs (pairs with items larger than the page size). o Intermixing hash cursor operations with deletes. "

My problem arises, I think, because I have been doing the fourth of these operations - i.e. overwriting long items in a hash. The problems others are experiencing perhaps have a similar cause, though the original problem summary says "even with a value size of 1 the bug will occur", so perhaps not.

I'm now using a workaround which involves writing several shorter items, each containing a slice of the data formerly held in the one long item. For keys I use my old key with a subscript number appended. It isn't nice, but it seems to be working.

Martin Gradwell.


Comment By: Skip Montanaro (montanaro) Date: 2001-08-04 08:12

Message: Logged In: YES user_id=44345

Based upon the traceback Tim reported, my guess is that the exception is being raised near the end of bsddb_ass_sub. Tim, can you give it a try changing anydbm.open to bsddb.btopen? As I recall, the significant bug(s) in libdb were in the hash file implementation. It's unfortunate that anydbm has used the hash file all these years, but it's a bit late to spring that change on unsuspecting users now without going through a significant transition period.

Skip


Comment By: Tim Peters (tim_one) Date: 2001-08-03 14:40

Message: Logged In: YES user_id=31435

Thanks for taking a look, Skip! On Win98SE it dies for me like so:

... 70000 71000 72000 Last i: 72758, last key:abcdef1691515.8934 Traceback (most recent call last): File "ka.py", line 15, in ? db[key] = val bsddb.error: (0, 'Error')

test.dbm is 37,778,944 bytes at the end. I assume Anonymous has the same problem (if not, he/she should say so).

On Windows we use the ancient db.1.85.win32.zip, from the "bsd db" (not "bsddb"!) link at

http://www.nightmare.com/software.html

I doubt Sam has done any maintenance on that in years; and afraid I don't know anything else about this.


Comment By: Skip Montanaro (montanaro) Date: 2001-08-03 13:25

Message: Logged In: YES user_id=44345

What version of libdb are you using? I'm running your script on Linux at the moment. I had to change it slightly because the only machine I have available with the spare cojones to run that script is running 1.5.2 (so I call random.uniform instead of using a Random instance). On that machine I'm sort of ashamed to say I'm still running the known buggy libdb 1.85. So far I'm up to 680,000 keys with a db file of over 166MB with no problem. On my laptop running 2.1 and libdb3 (and a much more modestly performing disk drive) I gave up after about 287,000 keys. I then changed the db open call to bsddb.btopen and watched it march (slowly) up to 183,000 keys and a 32MB file on disk before I killed it. Aside from the grief it gives my disk drives, I don't see anything particularly bad happening.

You didn't include a traceback with your bug report. What was printed? Perhaps it's something simple like running out of disk space. In any case, I think trying to create a libdb database of 1,000,000 sort of random keys is going to strain that package and most disk drives in any case, bugs or no bugs.

My guess is that if there's a bug it's in libdb, not the bsddb module.


Comment By: Nobody/Anonymous (nobody) Date: 2001-08-03 00:50

Message: Logged In: NO

Here it is:

import anydbm import bsddb import random

MAX = 1000000 r = random.Random(42) r.seed(1017) db = anydbm.open("test.dbm", "n") #db = bsddb.hashopen("test.dbm", "n") try: for i in xrange(0, MAX): if i % 1000 == 0: print i key = "abcdef" + str(r.uniform(0, 10 * MAX)) val = "a" * 80 + str(i) db[key] = val finally: db.close() print "Last i: %s, last key:%s" % (i,key)


Comment By: Tim Peters (tim_one) Date: 2001-08-02 12:41

Message: Logged In: YES user_id=31435

Alas, there's no script attached -- please attach one, so we have something concrete to investigate.


Comment By: Nobody/Anonymous (nobody) Date: 2001-08-02 03:08

Message: Logged In: NO

I was getting crashes in shelve module, Using NT4 (Python 2.1 (#15, Apr 16 2001, 18:25:49) [MSC 32 bit (Intel)] on win32). I've changed my program to re-read previously written keys fairly frequently, and I get keyerrors for keys that have definitely been written, and that gave no error a little earlier in the same program. The program doesn't contain any delete statements.

The same program works when using dumbdbm instead of bsddb (but produces huge indexes), so there definitely appears to be a problem with bsddbm on windows NT.


You can respond by visiting: http://sourceforge.net/tracker/?func=detail&atid=105470&aid=445862&group_id=5470