SequenceMatcher caches the result of get_matching_blocks and get_opcodes. There are some problems with this: What get_matching_blocks caches is a list of tuples. The first call does not return that list: it returns map(Match._make, self.matching_blocks) (converting the tuples to namedtuples). Subsequent calls just return self.matching_blocks directly. Especially in python 3 and up this is weird, since the first call returns a map object while later calls return a list. This caching behavior is not documented, so calling code may mutate the returned list. One example of calling code is difflib itself: get_grouped_opcodes mutates the result of get_opcodes (a cached list). I am not sure if the right fix is to have get_grouped_opcodes copy before it mutates or to have get_opcodes return a copy. Snippet demonstrating both bugs: matcher = difflib.SequenceMatcher(a='aaaaaaaabc', b='aaaaaaaadc') print(list(matcher.get_matching_blocks())) # This should print the same thing, but it does not: print(list(matcher.get_matching_blocks())) print(matcher.get_opcodes()) print(list(matcher.get_grouped_opcodes())) # This should print the same thing as the previous get_opcodes() # list, but it does not: print(matcher.get_opcodes())
That fixes the first problem in python 2. It should do: self.matching_blocks = [Match._make(t) for t in non_adjacent] in python 3 though, or it will return an already-exhausted map object if it is called again. This leaves the problem of other code mutating the cached list (not realizing it is cached). I think it would make sense for these functions to just return a shallow copy of their cached list, but I have not thought that through much.