msg201209 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2013-10-25 06:38 |
This problem occurred in . Basicly, dbm.dumb is not consistent with dbm.gnu and dbm.ndbm when the database is closed, as seen in the following: >>> import dbm.dumb as d >>> db = d.open('test.dat', 'c') >>> db.close() >>> db.keys() Traceback (most recent call last): File "", line 1, in File "/tank/libs/cpython/Lib/dbm/dumb.py", line 212, in keys return list(self._index.keys()) AttributeError: 'NoneType' object has no attribute 'keys' >>> vs >>> import dbm.gnu as g >>> db = g.open('test.dat', 'c') >>> db.close() >>> db.keys() Traceback (most recent call last): File "", line 1, in _gdbm.error: GDBM object has already been closed >>> Attaching a patch. |
|
|
msg201819 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2013-10-31 14:36 |
Patch looks good to me, I'll apply it when I apply issue 19282. |
|
|
msg201823 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-10-31 15:13 |
Additional checks can slowdown the code. It doesn't matter in some operations, but not on __len__, __contains__, etc. EAFP approach is to catch AttributeError and raise appropriate dbm.dumb.error exception. |
|
|
msg201824 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-10-31 15:16 |
Yet one nitpick. I think that closing check should be after argument type check and key encoding. |
|
|
msg201900 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2013-11-01 14:06 |
Here's the new version which addresses your last comment. Regarding the first issue, I don't believe that the result will be as readable (but I agree with you that it will be better). For instance, `items` will probably look like this: try: return [(key, self[key]) for key in self._index.keys()] except AttributeError: raise dbm.dumb.error(...) from None but will look totally different for other __len__: try: return len(self._index) except TypeError: raise dbm.dumb.error(...) from None. We could catch TypeError only for dunder methods though and for the rest of the methods check the value of _index before access. |
|
|
msg206050 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2013-12-13 10:34 |
Serhiy, what's the status for this? Should this be targeted for 3.5? |
|
|
msg206058 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-12-13 11:09 |
> Regarding the first issue, I don't believe that the result will be as readable (but I agree with you that it will be better). Yes, perhaps performance of the dbm.dumb module is not worst this cost. Are you have any benchmarking results? The patch LGTM. > Should this be targeted for 3.5? I think yes. |
|
|
msg206069 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2013-12-13 12:40 |
Here's a benchmark: - with current patch: # ./python -S -m timeit -n 10000 -s 'import dbm.dumb as dbm; d=dbm.open("x.dat", "c");d.close()' 'try:' ' len(d)' 'except OSError:' ' pass' 10000 loops, best of 3: 1.78 usec per loop - using try: return len(self._index) except TypeError: raise error('...') # ./python -S -m timeit -n 10000 -s 'import dbm.dumb as dbm; d=dbm.open("x.dat", "c");d.close()' 'try:' ' len(d)' 'except OSError:' ' pass' 10000 loops, best of 3: 3.27 usec per loop Now this seems odd, maybe catching + reraising an exception has a greater overhead than a func call and checking if an attribute is None. Anyway, I agree that dbm.dumb is not performance critical. |
|
|
msg206094 - (view) |
Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) *  |
Date: 2013-12-13 15:07 |
IMHO it is a bug fix, not a new feature, and could be applied in 3.3 and 3.4. |
|
|
msg206151 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-12-13 21:53 |
> Now this seems odd, maybe catching + reraising an exception has a greater overhead than a func call and checking if an attribute is None. Yes, of course, catching + reraising an exception is costly. But when an exception is not raised, this is cheap. Usually len() is called for open database. > IMHO it is a bug fix, not a new feature, and could be applied in 3.3 and 3.4. Hmm. If consider dbm.dumb as separated module, this is perhaps a new feature. But if consider it as an implementation of the dbm protocol, it should conform to other dbm modules, and this is a bugfix. What would you say Nick? |
|
|
msg206183 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2013-12-14 13:28 |
I think it's definitely still OK to fix this in 3.4, but I think it's borderline enough to avoid including in the last ever 3.3 release. Changing exception types always introduces a little backwards compatibility risk, so it's something I lean towards only doing in new feature releases, even in cases like this where the old exception was clearly not the best choice. |
|
|
msg206184 - (view) |
Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) *  |
Date: 2013-12-14 13:35 |
> in the last ever 3.3 release. suggests that there will be >=2 bug fix releases in 3.3 branch (3.3.4 soon and 3.3.5 around release of 3.4.0). |
|
|
msg206185 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2013-12-14 13:51 |
That's slightly more acceptable, but it's still hard to make the case that changing exception types is a reasonable thing to do in a maintenance release (it's only the specific nature of the error that makes me consider it reasonable even in a feature release). |
|
|
msg206416 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2013-12-17 10:43 |
If we agree that this should be fixed in 3.4, can somebody commit it before the second beta? |
|
|
msg213069 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2014-03-10 18:58 |
Updated the patch to catch dbm.dumb.error in tests (reverting a change added in c2f1bb56760d). |
|
|
msg213840 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2014-03-17 07:15 |
Can this patch be committed, now that 3.5 is active? |
|
|
msg217091 - (view) |
Author: Jim Jewett (Jim.Jewett) *  |
Date: 2014-04-23 21:56 |
_check_closed sounds like you expect it to be closed, and might even assert that it is closed, except that you want the check run even in release mode and/or it might fail. Since being closed is actually the unexpectedly broken state, I would prefer that you rename it to something else, like _verify_open. |
|
|
msg217093 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2014-04-23 22:01 |
gzip uses the same name, _check_closed, but your suggestion sounds good. I'll update the patch. |
|
|
msg217097 - (view) |
Author: Jim Jewett (Jim.Jewett) *  |
Date: 2014-04-23 22:38 |
I think the requested timing regression was for the non-broken case. When the database has NOT been closed, and keys() still works, will it be way slower than before? Note that I am not asking you to do that test (though the eventual committer might); the implementation of whichdb leaves me believing that almost anyone who is likely to care will have already followed the docunmentation's recommendation to install another *dbm ahead of dumb. |
|
|
msg217116 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2014-04-24 05:11 |
On my machine I get the following results for the unclosed-database case. With patch: # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c');len(d)" 100000 loops, best of 3: 0.0638 usec per loop Without patch: # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c');len(d)" 100000 loops, best of 3: 0.0634 usec per loop |
|
|
msg217216 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2014-04-26 20:57 |
New changeset e8343cb98cc3 by Benjamin Peterson in branch '3.4': make operations on closed dumb databases raise a consistent exception (closes #19385) http://hg.python.org/cpython/rev/e8343cb98cc3 New changeset dbceba88b96e by Benjamin Peterson in branch 'default': merge 3.4 (#19385) http://hg.python.org/cpython/rev/dbceba88b96e |
|
|
msg218435 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-05-13 11:23 |
Claudiu, your benchmark is broken, it measure a no-op. Correct benchmark: $ ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)" 3.4: 100000 loops, best of 3: 2.56 usec per loop 3.5: 100000 loops, best of 3: 4.17 usec per loop There is 60% regression. |
|
|
msg218440 - (view) |
Author: Jim Jewett (Jim.Jewett) *  |
Date: 2014-05-13 11:39 |
Did you try 3.5 without the patch, in case the issue is not with his code? On May 13, 2014 7:23 AM, "Serhiy Storchaka" <report@bugs.python.org> wrote: > > Serhiy Storchaka added the comment: > > Claudiu, your benchmark is broken, it measure a no-op. > > Correct benchmark: > > $ ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; > d=dbm.open('x.dat', 'c')" "len(d)" > > 3.4: 100000 loops, best of 3: 2.56 usec per loop > 3.5: 100000 loops, best of 3: 4.17 usec per loop > > There is 60% regression. > > ---------- > status: closed -> open > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue19385> > _______________________________________ > |
|
|
msg218441 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2014-05-13 11:42 |
Right, my benchmark was indeed flawed. Here are the new results on my machine: Without the patch # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)" 100000 loops, best of 3: 0.564 usec per loop With the patch # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)" 100000 loops, best of 3: 0.857 usec per loop Even having an empty _verify_open in __len__ method leads to this: # ./python -S -m timeit -n 100000 -s "import dbm.dumb as dbm; d=dbm.open('x.dat', 'c')" "len(d)" 100000 loops, best of 3: 0.749 usec per loop |
|
|
msg218444 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2014-05-13 12:10 |
Here's a new patch which uses the EAFP approach for dunder methods (__len__, __contains__ etc) and the _verify_open method for the other methods (.keys, .iterkeys) etc. Now the results are similar with the benchmark without the patch. |
|
|
msg219083 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-05-25 11:28 |
There is no need to speed up methods which do IO (__getitem__, __setitem__, __delitem__). However method which works only with an index (keys, iterkeys, __contains__, __len__) can be optimized. In the __contains__ method an exception can be raised not only by nen-existent __contains__ of None, but from __hash__ or __eq__ methods of a key, so we should distinguish these cases. |
|
|
msg219277 - (view) |
Author: PCManticore (Claudiu.Popa) *  |
Date: 2014-05-28 15:27 |
Looks good to me. |
|
|
msg219281 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2014-05-28 15:51 |
New changeset 95207bcd8298 by Serhiy Storchaka in branch '3.4': Restore performance of some dumb database methods (regression introduced by #19385). http://hg.python.org/cpython/rev/95207bcd8298 New changeset 2e59e0b579e5 by Serhiy Storchaka in branch 'default': Restore performance of some dumb database methods (regression introduced by #19385). http://hg.python.org/cpython/rev/2e59e0b579e5 |
|
|
msg219283 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2014-05-28 15:53 |
Thanks Claudiu. |
|
|