[Python-Dev] Difflib modifications [reposted] (original) (raw)

Christian Robottom Reis kiko at async.com.br
Wed Dec 1 14:08:25 CET 2004


[Reposted to python-dev!]

Hello there,

We've has done some customizations to difflib to make it work well

with pagetests we are running on a project at Canonical, and we are looking for some guidance as to what's the best way to do them. There are some tricky bits that have to do with how the class inheritance is put together, and since we would want to avoid duplicating difflib I figured we'd ask and see if some grand ideas come up.

A [rough first cut of the] patch is inlined below. Essentially, it does:

- Implements a custom Differ.fancy_compare function that supports
  ellipsis and omits equal content

- Hacks _fancy_replace to skip ellipsis as well.

- Hacks best_ratio and cutoff. I'm a bit fuzzy on why this was
  changed, to be honest, and Celso's travelling today, but IIRC it
  had to do with how difflib grouped changes.

Essentially, what we aim for is:

- Ignoring ellipsisized(!) content
- Omitting content which is equal

I initially thought the best way to do this would be to inherit from SequenceMatcher and make it not return opcodes for ellipsis. However, there is no easy way to replace the class short of rewriting major bits of Differ. I suspect this could be easily changed to use a class attribute that we could override, but let me know what you think of the whole thing.

--- /usr/lib/python2.3/difflib.py 2004-11-18 20:05:38.720109040 -0200 +++ difflib.py 2004-11-18 20:24:06.731665680 -0200 @@ -885,6 +885,45 @@ for line in g: yield line

@@ -926,7 +965,13 @@

     # don't synch up unless the lines have a similarity score of at
     # least cutoff; best_ratio tracks the best score seen so far

@@ -981,7 +1026,11 @@ cruncher.set_seqs(aelt, belt) for tag, ai1, ai2, bj1, bj2 in cruncher.get_opcodes(): la, lb = ai2 - ai1, bj2 - bj1

Take care,

Christian Robottom Reis | http://async.com.br/~kiko/ | [+55 16] 3361 2331



More information about the Python-Dev mailing list