Issue 5736: Add the iterator protocol to dbm modules (original) (raw)

Created on 2009-04-11 13:47 by akitada, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue5736.diff akitada,2010-10-16 15:46 iter(dbm.keys()) review
Messages (14)
msg85856 - (view) Author: Akira Kitada (akitada) * Date: 2009-04-11 13:47
In Python 2.6, dbm modules othar than bsddb don't support the iterator protocol. >>> import dbm >>> d = dbm.open('spam.dbm', 'c') >>> for k in range(5): d["key%d" % k] = "value%d" % k ... >>> for k in d: print k, d[k] ... Traceback (most recent call last): File "", line 1, in TypeError: 'dbm.dbm' object is not iterable Adding iterator support would make dbm modules more convenient and easier to use.
msg85859 - (view) Author: Akira Kitada (akitada) * Date: 2009-04-11 14:11
Attached is a patch that adds the iterator protocol. Now it can be interated through like: >>> for k in d: print k, d[k] ... key1 vale1 key3 vale3 key0 vale0 key2 vale2 key4 vale4 The problem is there is no way to get the internal pointer back to the start. So Once it reached to the end, you are done. >>> for k in d: print k, d[k] ... The solution to this would be: - Add a method to get the pointer back to the start (with {first,next}key API) - Add a method that returns a generator
msg85867 - (view) Author: Akira Kitada (akitada) * Date: 2009-04-11 16:14
Revised patch adds firstkey and nextkey to dbm. Now the internal pointer can be reset with firstkey.
msg85878 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-04-11 22:03
Would you like to fix gdbm as well?
msg85888 - (view) Author: Akira Kitada (akitada) * Date: 2009-04-12 09:47
Here's another patch which addsd iter to dbm and gdbm. Note that dbm and gdbm C API is a little different. gdbm_nextkey requires key for its argument, dbm_nextkey don't. So I had to use for gdbm an static variable that points to the current position. Now iterator in gdbm and dbm works differently. >>> import dbm >>> d = dbm.open('foo', 'n') >>> d['k1'] = 'v1';d['k2'] = 'v2'; >>> for i in d: print i; break ... k1 >>> for i in d: print i ... k2 >>> for i in d: print i ... >>> import gdbm >>> gd = gdbm.open('foo.gdbm', 'n') >>> gd['k1'] = 'v1';gd['k2'] = 'v2'; >>> for i in gd: print i; break ... k2 >>> for i in gd: print i for i in gd: print i ... k1 >>> for i in gd: print i ... k2 k1
msg85889 - (view) Author: Akira Kitada (akitada) * Date: 2009-04-12 10:11
Of course iter should work in the same way in all dbm modules. iter in dbm/gdbm should work like dumbdbm's iter. >>> dumb = dumbdbm.open('foo', 'n') >>> dumb['k1'] = 'v1';dumb['k2'] = 'v2'; >>> for i in dumb: print i; break ... k2 >>> for i in dumb: print i for i in dumb: print i ... k2 k1 >>> for i in dumb: print i for i in dumb: print i ... k2 k1
msg85928 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2009-04-12 23:45
Akira> Note that dbm and gdbm C API is a little different. gdbm_nextkey Akira> requires key for its argument, dbm_nextkey don't. So I had to Akira> use for gdbm an static variable that points to the current Akira> position. I don't think this is going to fly. A static variable is not thread-safe. What's worse, even in a non-threaded environment you might want to iterate over the gdbm file simultaneously from two different places.
msg85931 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2009-04-13 00:19
skip> What's worse, even in a non-threaded environment you might want to skip> iterate over the gdbm file simultaneously from two different skip> places. Or iterate over two different gdbm files simultaneously.
msg85944 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-04-13 12:56
I agree with Skip that using a static variable is not appropriate. The proper solution probably would be to define a separate gdbm_iter object which always preserves the last key returned.
msg85946 - (view) Author: Akira Kitada (akitada) * Date: 2009-04-13 13:43
Yes, using a static variable there is wrong and actually I'm now working on "dbm_iterobject" just as Martin suggested. dbm iterator should behave just like one in dict. I think I can use Objects/dictobject.c as a good example for this. Attached is minimal tests for dbm iterator.
msg91339 - (view) Author: Christopher Lee (foobaron) Date: 2009-08-06 00:32
Another reason this issue is really important, is that the lack of a consistent iter() interface for dbm.* makes shelve iteration not scalable; i.e. trying to iterate on a Shelf will run self.dict.keys() to load the entire index into memory. This seems contrary to a primary purpose of shelve, namely to store the index on-disk so as to avoid having to keep the whole index in memory. I suspect that for most users, shelve is the main way they will access the dbm.* interfaces. Therefore, fixing the dbm.* interfaces so that shelve is scalable seems like an important need. Once dbm and gdbm support the iterator protocol, it will be trivial to add an __iter__() method to shelve.Shelf, that simply returns iter(self.dict).
msg118874 - (view) Author: Akira Kitada (akitada) * Date: 2010-10-16 15:46
This patch just uses PyObject_GetIter to get an iter. (I just copied the idea from )
msg123358 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-12-04 15:20
This may be superseded by #9523. There are comments and patches in both issues, so I’m not closing either as duplicate of the other.
msg128465 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-02-12 20:17
#9523 has a more comprehensive patch in progress, adding __iter__ and other mapping methods, so I’m closing this one.
History
Date User Action Args
2022-04-11 14:56:47 admin set github: 49986
2011-02-12 20:17:43 eric.araujo set status: open -> closedsuperseder: Improve dbm modulesversions: - Python 3.2nosy:loewis, rhettinger, eric.araujo, akitada, foobaron, ysj.raymessages: + resolution: duplicatestage: resolved
2010-12-04 15:20:59 eric.araujo set nosy: + eric.araujomessages: +
2010-10-18 11:41:18 pitrou set nosy: + rhettinger
2010-10-16 15:47:01 akitada set files: + issue5736.diffversions: + Python 3.2, - Python 2.7nosy: + ysj.raymessages: +
2010-10-16 15:32:10 akitada set files: - issue5736.diff
2010-10-16 15:32:06 akitada set files: - test_issue5736.diff
2010-10-16 15:32:02 akitada set files: - issue5736.diff
2010-10-16 15:31:52 akitada set files: - issue5736.diff
2010-05-20 20:31:03 skip.montanaro set nosy: - skip.montanaro
2009-08-06 00:32:36 foobaron set nosy: + foobaronmessages: +
2009-04-13 13:43:40 akitada set files: + test_issue5736.diffmessages: +
2009-04-13 12:56:08 loewis set messages: +
2009-04-13 00:19:22 skip.montanaro set messages: +
2009-04-12 23:45:07 skip.montanaro set nosy: + skip.montanaromessages: +
2009-04-12 10:11:41 akitada set messages: +
2009-04-12 09:47:38 akitada set files: + issue5736.diffmessages: +
2009-04-11 22:03:38 loewis set nosy: + loewismessages: +
2009-04-11 16:14:07 akitada set files: + issue5736.diffmessages: +
2009-04-11 14:11:27 akitada set files: + issue5736.diffkeywords: + patchmessages: +
2009-04-11 13:47:52 akitada create