Issue 4889: difflib - Python tracker (original) (raw)

Issue4889

Created on 2009-01-09 05:49 by pratik.potnis, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
c1.ios pratik.potnis,2009-01-09 05:49 This file contains above mentiones two strings. Save them in two different files and then take the diff of files.
Messages (4)
msg79455 - (view) Author: Pratik Potnis (pratik.potnis) Date: 2009-01-09 05:49
While using function HtmlDiff() from Library difflib, if there is difference in caps of two strings it does not provide proper diff results. Two strings in two different files in this context that I used are: hostname vaijain123 and (this string is in small caps) hostname CAVANC1001CR1 (This one is in large caps) Expected behavior after diffing : It should show hostname changed (and highlight it with Yellow color) instead of this it is showing Added in one file and deleted in another file. (Highlighting them with green and red color respectively) When tried with same caps (either small or large) it shows expected behavior(highlighting the strings in yellow color). Also with numbers it works well. I think its an issue with the CAPS of letters. difflib is not able to differentiate between the caps of letters.
msg79457 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-01-09 08:47
Can you be more precise? I tried to reproduce your problem, but I only get added/deleted chunks, nothing in yellow. Please include a script that shows what you did, and the result you expected.
msg79721 - (view) Author: Gabriel Genellina (ggenellina) Date: 2009-01-13 06:38
You (as a human) most likely parse these lines: hostname vaijain123 hostname CAVANC1001CR1 as "two words, the first one is the same, the second word changed". But difflib sees them more or less as: "21 letters, 8 of them are the same, 13 are different". There are many more differences than matches, so it makes sense to show the changes as a complete replacement: >>> d = difflib.ndiff(["hostname vaijain123\n"], ["hostname CAVANC1001CR1\n"]) >>> print ''.join(d) - hostname vaijain123 + hostname CAVANC1001CR1 It has nothing to do with upper or lower case letters ("A" and "a" are completely different things for difflib). If the names were shorter, it might consider a match: >>> d = difflib.ndiff(["hostname vai\n"], ["hostname CAV\n"]) >>> print ''.join(d) - hostname vai ? ^^^ + hostname CAV ? ^^^ Note how the ratio changes: >>> difflib.SequenceMatcher(None, "hostname vaijain123", "hostname CAVANC1001CR1").ratio() 0.48780487804878048 >>> difflib.SequenceMatcher(None, "hostname vai", "hostname CAV").ratio () 0.75 The ratio must be 0.75 or higher for a differ to consider two lines "close enough" to show intra-line differences.
msg84224 - (view) Author: Jack Diederich (jackdied) * (Python committer) Date: 2009-03-26 21:30
closing, Garbriel's explanation is sufficient.
History
Date User Action Args
2022-04-11 14:56:43 admin set github: 49139
2009-03-26 21:30:56 jackdied set status: open -> closednosy: + jackdiedmessages: + resolution: not a bug
2009-01-13 06:38:05 ggenellina set nosy: + ggenellinamessages: +
2009-01-09 08:47:42 amaury.forgeotdarc set nosy: + amaury.forgeotdarcmessages: +
2009-01-09 05:49:38 pratik.potnis create