[Python-Dev] Difflib modifications [reposted] (original) (raw)
Christian Robottom Reis kiko at async.com.br
Wed Dec 1 14:08:25 CET 2004
- Previous message: [Python-Dev] File encodings
- Next message: [Python-Dev] Re: Small subprocess patch
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[Reposted to python-dev!]
Hello there,
We've has done some customizations to difflib to make it work wellwith pagetests we are running on a project at Canonical, and we are looking for some guidance as to what's the best way to do them. There are some tricky bits that have to do with how the class inheritance is put together, and since we would want to avoid duplicating difflib I figured we'd ask and see if some grand ideas come up.
A [rough first cut of the] patch is inlined below. Essentially, it does:
- Implements a custom Differ.fancy_compare function that supports
ellipsis and omits equal content
- Hacks _fancy_replace to skip ellipsis as well.
- Hacks best_ratio and cutoff. I'm a bit fuzzy on why this was
changed, to be honest, and Celso's travelling today, but IIRC it
had to do with how difflib grouped changes.Essentially, what we aim for is:
- Ignoring ellipsisized(!) content
- Omitting content which is equalI initially thought the best way to do this would be to inherit from SequenceMatcher and make it not return opcodes for ellipsis. However, there is no easy way to replace the class short of rewriting major bits of Differ. I suspect this could be easily changed to use a class attribute that we could override, but let me know what you think of the whole thing.
--- /usr/lib/python2.3/difflib.py 2004-11-18 20:05:38.720109040 -0200 +++ difflib.py 2004-11-18 20:24:06.731665680 -0200 @@ -885,6 +885,45 @@ for line in g: yield line
- def fancy_compare(self, a, b):
""">>> import difflib>>> engine = difflib.Differ()>>> got = ['World is Cruel', 'Dudes are Cool']>>> want = ['World ... Cruel', 'Dudes ... Cool']>>> list(engine.fancy_compare(want, got))[]"""cruncher = SequenceMatcher(self.linejunk, a, b)for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():if tag == 'replace':## replace single lineif a[alo:ahi][0].rstrip() == '...' and ((ahi - alo) == 1):g = None## two lines replacedelif a[alo:ahi][0].rstrip() == '...' and ((ahi - alo) > 1):g = self._fancy_replace(a, (ahi - 1), ahi,b, (bhi - 1), bhi)## commonelse:g = self._fancy_replace(a, alo, ahi, b, blo, bhi)elif tag == 'delete':g = self._dump('-', a, alo, ahi)elif tag == 'insert':g = self._dump('+', b, blo, bhi)elif tag == 'equal':# do not show anythingg = Noneelse:raise ValueError, 'unknown tag ' + `tag`if g:for line in g:yield linedef _dump(self, tag, x, lo, hi): """Generate comparison results for a same-tagged range.""" for i in xrange(lo, hi):
@@ -926,7 +965,13 @@
# don't synch up unless the lines have a similarity score of at
# least cutoff; best_ratio tracks the best score seen so farbest_ratio, cutoff = 0.74, 0.75
#best_ratio, cutoff = 0.74, 0.75## reduce the cutoff to have enough similarity## between '<something> ... <something>' and '<a> blabla </a>'## for examplebest_ratio, cutoff = 0.009, 0.01cruncher = SequenceMatcher(self.charjunk) eqi, eqj = None, None # 1st indices of equal lines (if any)
@@ -981,7 +1026,11 @@ cruncher.set_seqs(aelt, belt) for tag, ai1, ai2, bj1, bj2 in cruncher.get_opcodes(): la, lb = ai2 - ai1, bj2 - bj1
if tag == 'replace':
if aelt[ai1:ai2] == '...':returnif tag == 'replace': atags += '^' * la btags += '^' * lb elif tag == 'delete':
Take care,
Christian Robottom Reis | http://async.com.br/~kiko/ | [+55 16] 3361 2331
- Previous message: [Python-Dev] File encodings
- Next message: [Python-Dev] Re: Small subprocess patch
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]