[Python-Dev] Difflib modifications [reposted] (original) (raw)

Wed Dec 1 14:08:25 CET 2004

We've has done some customizations to difflib to make it work well
- Implements a custom Differ.fancy_compare function that supports
  ellipsis and omits equal content

- Hacks _fancy_replace to skip ellipsis as well.

- Hacks best_ratio and cutoff. I'm a bit fuzzy on why this was
  changed, to be honest, and Celso's travelling today, but IIRC it
  had to do with how difflib grouped changes.
- Ignoring ellipsisized(!) content
- Omitting content which is equal
   """
   >>> import difflib
   >>> engine = difflib.Differ()
   >>> got = ['World is Cruel', 'Dudes are Cool']
   >>> want = ['World ... Cruel', 'Dudes ... Cool']
   >>> list(engine.fancy_compare(want, got))
   []
   """
   cruncher = SequenceMatcher(self.linejunk, a, b)
   for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():
       if tag == 'replace':
           ## replace single line
           if a[alo:ahi][0].rstrip() == '...' and ((ahi - alo) == 1):   
               g = None
           ## two lines replaced  
           elif a[alo:ahi][0].rstrip() == '...' and ((ahi - alo) > 1):   
               g = self._fancy_replace(a, (ahi - 1), ahi,
                                       b, (bhi - 1), bhi)
           ## common
           else:
               g = self._fancy_replace(a, alo, ahi, b, blo, bhi)
       elif tag == 'delete':
           g = self._dump('-', a, alo, ahi)
       elif tag == 'insert':
           g = self._dump('+', b, blo, bhi)
       elif tag == 'equal':
           # do not show anything
           g = None
       else:
           raise ValueError, 'unknown tag ' + `tag`
       if g:
           for line in g:
               yield line
def _dump(self, tag, x, lo, hi):
    """Generate comparison results for a same-tagged range."""
    for i in xrange(lo, hi):
     # don't synch up unless the lines have a similarity score of at
     # least cutoff; best_ratio tracks the best score seen so far
   best_ratio, cutoff = 0.74, 0.75
   #best_ratio, cutoff = 0.74, 0.75
   ## reduce the cutoff to have enough similarity
   ## between '<something> ... <something>' and '<a> blabla </a>'
   ## for example 
   best_ratio, cutoff = 0.009, 0.01
    cruncher = SequenceMatcher(self.charjunk)
    eqi, eqj = None, None   # 1st indices of equal lines (if any)
           if tag == 'replace':
           if aelt[ai1:ai2] == '...':
               return
           if tag == 'replace':                    
               atags += '^' * la
               btags += '^' * lb
           elif tag == 'delete':

[Python-Dev] Difflib modifications [reposted] (original) (raw)

Take care,