Issue 18149: filecmp.cmp() incorrect results when previously compared file is modified within modification time resolution (original) (raw)

Created on 2013-06-06 14:07 by fbm, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (10)

msg190715 - (view)

Author: Matej Fröbe (fbm)

Date: 2013-06-06 14:07

Example:

with open('', 'w') as f: f.write('a')

with open('', 'w') as f: f.write('a')

print filecmp.cmp('', '', shallow=False) # true

with open('', 'w') as f: f.write('b')

print filecmp.cmp('', '', shallow=False) # true

Because of the caching, both calls to filecmp.cmp() return true on my system.

When retrieving value from cache, the function filecmp.cmp() checks the signatures of the files:

s1 = _sig(os.stat(f1)) s2 = _sig(os.stat(f2)) ... outcome = _cache.get((f1, f2, s1, s2))

But the signatures in cache are the same, if the file sizes and times of modification (os.stat().st_mtime) haven't changed from the last call, even if the content has changed.

The buffer is mentioned in the documentation, but there isn't any documented way to clear it. It also isn't nice IMO, that one has to worry about the file system's resolution of the file modification time when calling a simple file comparison.

msg190774 - (view)

Author: Ned Deily (ned.deily) * (Python committer)

Date: 2013-06-07 20:26

It seems like this would be a fairly rare situation and, as you note, dependent on the underlying file system. But it would be easy to add a new function to the module to clear its cache in cases where it is known this might be a problem. In fact, in Issue11802 a clear_cache function was proposed to solve the problem of the cache growing without bounds but that problem was solved by the simpler solution of discarding the cache when it gets above 100 entries.

msg190790 - (view)

Author: Raymond Hettinger (rhettinger) * (Python committer)

Date: 2013-06-08 01:56

+1 for a cache clearing function like the one in re.py

msg191026 - (view)

Author: Mark Levitt (melevittfl) *

Date: 2013-06-12 13:06

I've added a "clear_cache()" method to filecmp.py. Patch attached.

I had thought about implementing an optional parameter to only invalidate the cache of a specific file object, but figured I'd keep it simple for now.

First time submitting a patch, so apologies if I've done something the wrong way.

msg191048 - (view)

Author: Ned Deily (ned.deily) * (Python committer)

Date: 2013-06-12 21:03

Thanks for the patch, Mark. I've left some review comments via Rietveld (the review link next to the patch). Also, if you haven't already, please fill out the contributor form as described in the Developer's Guide (http://docs.python.org/devguide/patch.html#licensing).

msg191051 - (view)

Author: Mark Levitt (melevittfl) *

Date: 2013-06-12 23:40

Ned,

Thanks for taking the time to review. I've updated the docs, added a unit test, signed the contributor form, and made the changes/corrections from your review.

Updated patch attached.

msg191060 - (view)

Author: Ned Deily (ned.deily) * (Python committer)

Date: 2013-06-13 06:29

Looks good to me, other than that the doc change should include a version added directive (which can be added by the committer):

.. function:: clear_cache()

msg191067 - (view)

Author: Mark Levitt (melevittfl) *

Date: 2013-06-13 08:32

Cool. I've gone ahead and generated a new patch with the version added directive included.

msg191162 - (view)

Author: Roundup Robot (python-dev) (Python triager)

Date: 2013-06-14 22:20

New changeset bfd53dcb02ff by Ned Deily in branch 'default': Issue #18149: Add filecmp.clear_cache() to manually clear the filecmp cache. http://hg.python.org/cpython/rev/bfd53dcb02ff

msg191163 - (view)

Author: Ned Deily (ned.deily) * (Python committer)

Date: 2013-06-14 22:22

Committed for release in 3.4.0. Thanks, Mark.

History

Date

User

Action

Args

2022-04-11 14:57:46

admin

set

github: 62349

2013-06-14 22:22:19

ned.deily

set

status: open -> closed
resolution: fixed
messages: +

stage: commit review -> resolved

2013-06-14 22:20:23

python-dev

set

nosy: + python-dev
messages: +

2013-06-13 08:32:28

melevittfl

set

files: + 18149-3.patch

messages: +

2013-06-13 06:29:12

ned.deily

set

messages: +
stage: needs patch -> commit review

2013-06-12 23:40:21

melevittfl

set

files: + 18149-2.patch

messages: +

2013-06-12 21:03:40

ned.deily

set

messages: +

2013-06-12 13:06:56

melevittfl

set

files: + 18149.patch

nosy: + melevittfl
messages: +

keywords: + patch

2013-06-08 01:56:51

rhettinger

set

messages: +

2013-06-07 20:26:29

ned.deily

set

title: filecmp.cmp() - cache invalidation fails when file modification times haven't changed -> filecmp.cmp() incorrect results when previously compared file is modified within modification time resolution

keywords: + easy
nosy: + rhettinger, ned.deily, nadeem.vawda
versions: + Python 3.4, - Python 2.7
messages: +
stage: needs patch

2013-06-06 14:07:35

fbm

create