Issue 19385: dbm.dumb should be consistent when the database is closed (original) (raw)

Issue19385

Created on 2013-10-25 06:38 by Claudiu.Popa, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
dbm_dumb.patch	Claudiu.Popa,2013-10-25 06:40	review
dbm_dumb1.patch	Claudiu.Popa,2013-11-01 14:06	review
dbm_dumb1.patch	Claudiu.Popa,2013-11-01 21:03	add space in test	review
issue19385.patch	Claudiu.Popa,2014-03-10 18:58	review
issue19385_1.patch	Claudiu.Popa,2014-04-24 05:11	review
issue19385_speed.patch	Claudiu.Popa,2014-05-13 12:10	review
issue19385_speed_2.patch	serhiy.storchaka,2014-05-25 11:28	review

Messages (29)
msg201209 - (view)	Author: PCManticore (Claudiu.Popa) *	Date: 2013-10-25 06:38
This problem occurred in . Basicly, dbm.dumb is not consistent with dbm.gnu and dbm.ndbm when the database is closed, as seen in the following: >>> import dbm.dumb as d >>> db = d.open('test.dat', 'c') >>> db.close() >>> db.keys() Traceback (most recent call last): File "", line 1, in File "/tank/libs/cpython/Lib/dbm/dumb.py", line 212, in keys return list(self._index.keys()) AttributeError: 'NoneType' object has no attribute 'keys' >>> vs >>> import dbm.gnu as g >>> db = g.open('test.dat', 'c') >>> db.close() >>> db.keys() Traceback (most recent call last): File "", line 1, in _gdbm.error: GDBM object has already been closed >>> Attaching a patch.
msg201819 - (view)	Author: Alyssa Coghlan (ncoghlan) *	Date: 2013-10-31 14:36
Patch looks good to me, I'll apply it when I apply issue 19282.
msg201823 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-10-31 15:13
Additional checks can slowdown the code. It doesn't matter in some operations, but not on __len__, __contains__, etc. EAFP approach is to catch AttributeError and raise appropriate dbm.dumb.error exception.
msg201824 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-10-31 15:16
Yet one nitpick. I think that closing check should be after argument type check and key encoding.
msg201900 - (view)	Author: PCManticore (Claudiu.Popa) *	Date: 2013-11-01 14:06
Here's the new version which addresses your last comment. Regarding the first issue, I don't believe that the result will be as readable (but I agree with you that it will be better). For instance, `items` will probably look like this: try: return [(key, self[key]) for key in self._index.keys()] except AttributeError: raise dbm.dumb.error(...) from None but will look totally different for other __len__: try: return len(self._index) except TypeError: raise dbm.dumb.error(...) from None. We could catch TypeError only for dunder methods though and for the rest of the methods check the value of _index before access.
msg206050 - (view)	Author: PCManticore (Claudiu.Popa) *	Date: 2013-12-13 10:34
Serhiy, what's the status for this? Should this be targeted for 3.5?
msg206058 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-12-13 11:09
> Regarding the first issue, I don't believe that the result will be as readable (but I agree with you that it will be better). Yes, perhaps performance of the dbm.dumb module is not worst this cost. Are you have any benchmarking results? The patch LGTM. > Should this be targeted for 3.5? I think yes.
msg206069 - (view)	Author: PCManticore (Claudiu.Popa) *	Date: 2013-12-13 12:40
Here's a benchmark: - with current patch: # ./python -S -m timeit -n 10000 -s 'import dbm.dumb as dbm; d=dbm.open("x.dat", "c");d.close()' 'try:' ' len(d)' 'except OSError:' ' pass' 10000 loops, best of 3: 1.78 usec per loop - using try: return len(self._index) except TypeError: raise error('...') # ./python -S -m timeit -n 10000 -s 'import dbm.dumb as dbm; d=dbm.open("x.dat", "c");d.close()' 'try:' ' len(d)' 'except OSError:' ' pass' 10000 loops, best of 3: 3.27 usec per loop Now this seems odd, maybe catching + reraising an exception has a greater overhead than a func call and checking if an attribute is None. Anyway, I agree that dbm.dumb is not performance critical.
msg206094 - (view)	Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) *	Date: 2013-12-13 15:07
IMHO it is a bug fix, not a new feature, and could be applied in 3.3 and 3.4.
msg206151 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-12-13 21:53
> Now this seems odd, maybe catching + reraising an exception has a greater overhead than a func call and checking if an attribute is None. Yes, of course, catching + reraising an exception is costly. But when an exception is not raised, this is cheap. Usually len() is called for open database. > IMHO it is a bug fix, not a new feature, and could be applied in 3.3 and 3.4. Hmm. If consider dbm.dumb as separated module, this is perhaps a new feature. But if consider it as an implementation of the dbm protocol, it should conform to other dbm modules, and this is a bugfix. What would you say Nick?
msg206183 - (view)	Author: Alyssa Coghlan (ncoghlan) *	Date: 2013-12-14 13:28
I think it's definitely still OK to fix this in 3.4, but I think it's borderline enough to avoid including in the last ever 3.3 release. Changing exception types always introduces a little backwards compatibility risk, so it's something I lean towards only doing in new feature releases, even in cases like this where the old exception was clearly not the best choice.
msg206184 - (view)	Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) *	Date: 2013-12-14 13:35
> in the last ever 3.3 release. suggests that there will be >=2 bug fix releases in 3.3 branch (3.3.4 soon and 3.3.5 around release of 3.4.0).
msg206185 - (view)	Author: Alyssa Coghlan (ncoghlan) *	Date: 2013-12-14 13:51
That's slightly more acceptable, but it's still hard to make the case that changing exception types is a reasonable thing to do in a maintenance release (it's only the specific nature of the error that makes me consider it reasonable even in a feature release).
msg206416 - (view)	Author: PCManticore (Claudiu.Popa) *	Date: 2013-12-17 10:43
If we agree that this should be fixed in 3.4, can somebody commit it before the second beta?
msg213069 - (view)	Author: PCManticore (Claudiu.Popa) *	Date: 2014-03-10 18:58
Updated the patch to catch dbm.dumb.error in tests (reverting a change added in c2f1bb56760d).
msg213840 - (view)	Author: PCManticore (Claudiu.Popa) *	Date: 2014-03-17 07:15
Can this patch be committed, now that 3.5 is active?
msg217091 - (view)	Author: Jim Jewett (Jim.Jewett) *	Date: 2014-04-23 21:56
_check_closed sounds like you expect it to be closed, and might even assert that it is closed, except that you want the check run even in release mode and/or it might fail. Since being closed is actually the unexpectedly broken state, I would prefer that you rename it to something else, like _verify_open.
msg217093 - (view)	Author: PCManticore (Claudiu.Popa) *	Date: 2014-04-23 22:01
gzip uses the same name, _check_closed, but your suggestion sounds good. I'll update the patch.
msg217097 - (view)	Author: Jim Jewett (Jim.Jewett) *	Date: 2014-04-23 22:38
I think the requested timing regression was for the non-broken case. When the database has NOT been closed, and keys() still works, will it be way slower than before? Note that I am not asking you to do that test (though the eventual committer might); the implementation of whichdb leaves me believing that almost anyone who is likely to care will have already followed the docunmentation's recommendation to install another *dbm ahead of dumb.
msg217116 - (view)	Author: PCManticore (Claudiu.Popa) *	Date: 2014-04-24 05:11
On my machine I get the following results for the unclosed-database case. With patch: # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c');len(d)" 100000 loops, best of 3: 0.0638 usec per loop Without patch: # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c');len(d)" 100000 loops, best of 3: 0.0634 usec per loop
msg217216 - (view)	Author: Roundup Robot (python-dev)	Date: 2014-04-26 20:57
New changeset e8343cb98cc3 by Benjamin Peterson in branch '3.4': make operations on closed dumb databases raise a consistent exception (closes #19385) http://hg.python.org/cpython/rev/e8343cb98cc3 New changeset dbceba88b96e by Benjamin Peterson in branch 'default': merge 3.4 (#19385) http://hg.python.org/cpython/rev/dbceba88b96e
msg218435 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-05-13 11:23
Claudiu, your benchmark is broken, it measure a no-op. Correct benchmark: $ ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)" 3.4: 100000 loops, best of 3: 2.56 usec per loop 3.5: 100000 loops, best of 3: 4.17 usec per loop There is 60% regression.
msg218440 - (view)	Author: Jim Jewett (Jim.Jewett) *	Date: 2014-05-13 11:39
Did you try 3.5 without the patch, in case the issue is not with his code? On May 13, 2014 7:23 AM, "Serhiy Storchaka" <report@bugs.python.org> wrote: > > Serhiy Storchaka added the comment: > > Claudiu, your benchmark is broken, it measure a no-op. > > Correct benchmark: > > $ ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; > d=dbm.open('x.dat', 'c')" "len(d)" > > 3.4: 100000 loops, best of 3: 2.56 usec per loop > 3.5: 100000 loops, best of 3: 4.17 usec per loop > > There is 60% regression. > > ---------- > status: closed -> open > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue19385> > _______________________________________ >
msg218441 - (view)	Author: PCManticore (Claudiu.Popa) *	Date: 2014-05-13 11:42
Right, my benchmark was indeed flawed. Here are the new results on my machine: Without the patch # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)" 100000 loops, best of 3: 0.564 usec per loop With the patch # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)" 100000 loops, best of 3: 0.857 usec per loop Even having an empty _verify_open in __len__ method leads to this: # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)" 100000 loops, best of 3: 0.749 usec per loop
msg218444 - (view)	Author: PCManticore (Claudiu.Popa) *	Date: 2014-05-13 12:10
Here's a new patch which uses the EAFP approach for dunder methods (__len__, __contains__ etc) and the _verify_open method for the other methods (.keys, .iterkeys) etc. Now the results are similar with the benchmark without the patch.
msg219083 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-05-25 11:28
There is no need to speed up methods which do IO (__getitem__, __setitem__, __delitem__). However method which works only with an index (keys, iterkeys, __contains__, __len__) can be optimized. In the __contains__ method an exception can be raised not only by nen-existent __contains__ of None, but from __hash__ or __eq__ methods of a key, so we should distinguish these cases.
msg219277 - (view)	Author: PCManticore (Claudiu.Popa) *	Date: 2014-05-28 15:27
Looks good to me.
msg219281 - (view)	Author: Roundup Robot (python-dev)	Date: 2014-05-28 15:51
New changeset 95207bcd8298 by Serhiy Storchaka in branch '3.4': Restore performance of some dumb database methods (regression introduced by #19385). http://hg.python.org/cpython/rev/95207bcd8298 New changeset 2e59e0b579e5 by Serhiy Storchaka in branch 'default': Restore performance of some dumb database methods (regression introduced by #19385). http://hg.python.org/cpython/rev/2e59e0b579e5
msg219283 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-05-28 15:53
Thanks Claudiu.

History
Date	User	Action	Args
2022-04-11 14:57:52	admin	set	github: 63584
2014-05-28 15:53:06	serhiy.storchaka	set	status: open -> closedmessages: + stage: patch review -> resolved
2014-05-28 15:51:02	python-dev	set	messages: +
2014-05-28 15:27:46	Claudiu.Popa	set	messages: +
2014-05-25 11:28:53	serhiy.storchaka	set	files: + issue19385_speed_2.patchstage: resolved -> patch reviewmessages: + versions: + Python 3.4
2014-05-13 12:10:23	Claudiu.Popa	set	files: + issue19385_speed.patchmessages: +
2014-05-13 11:42:39	Claudiu.Popa	set	messages: +
2014-05-13 11:39:25	Jim.Jewett	set	messages: +
2014-05-13 11:23:09	serhiy.storchaka	set	status: closed -> openmessages: +
2014-04-26 20:57:20	python-dev	set	status: open -> closednosy: + python-devmessages: + resolution: fixedstage: commit review -> resolved
2014-04-24 05:11:39	Claudiu.Popa	set	files: + issue19385_1.patchmessages: +
2014-04-23 22:38:07	Jim.Jewett	set	messages: +
2014-04-23 22:01:17	Claudiu.Popa	set	messages: +
2014-04-23 21:56:52	Jim.Jewett	set	nosy: + Jim.Jewettmessages: +
2014-03-19 20:02:38	zach.ware	set	versions: + Python 3.5, - Python 3.4
2014-03-17 07:15:49	Claudiu.Popa	set	messages: +
2014-03-10 18:58:38	Claudiu.Popa	set	files: + issue19385.patchmessages: +
2013-12-17 10:43:23	Claudiu.Popa	set	messages: +
2013-12-14 13:51:36	ncoghlan	set	messages: +
2013-12-14 13:35:26	Arfrever	set	messages: +
2013-12-14 13:28:35	ncoghlan	set	messages: + versions: + Python 3.4, - Python 3.5
2013-12-13 21:53:26	serhiy.storchaka	set	messages: +
2013-12-13 15:07:04	Arfrever	set	messages: +
2013-12-13 12:40:41	Claudiu.Popa	set	messages: +
2013-12-13 11:09:14	serhiy.storchaka	set	stage: commit reviewmessages: + versions: + Python 3.5, - Python 3.4
2013-12-13 10:34:28	Claudiu.Popa	set	messages: +
2013-11-01 21:03:28	Claudiu.Popa	set	files: + dbm_dumb1.patch
2013-11-01 14:06:23	Claudiu.Popa	set	files: + dbm_dumb1.patchmessages: +
2013-10-31 15:16:24	serhiy.storchaka	set	messages: +
2013-10-31 15:13:47	serhiy.storchaka	set	nosy: + serhiy.storchakamessages: +
2013-10-31 14:36:39	ncoghlan	set	assignee: ncoghlanmessages: + nosy: + ncoghlan
2013-10-28 22:54:52	Arfrever	set	nosy: + Arfrever
2013-10-25 06:40:09	Claudiu.Popa	set	files: + dbm_dumb.patchkeywords: + patch
2013-10-25 06:38:41	Claudiu.Popa	create