Issue 8702: difflib: unified_diff produces wrong patches (again) (original) (raw)
I think difflib is behaving as intended here; changing to feature request.
Could you please clarify about the information loss? I'm not seeing it. As far as I can tell, the fact that unified_diff produces a list rather than a single string (as GNU diff effectively does) means that all necessary information about newlines is preserved, with no information loss:
newton:py3k dickinsm$ echo -n "one
two" > 1.txt
newton:py3k dickinsm$ echo -n "one
two
" > 2.txt
newton:py3k dickinsm$ ./python.exe
Python 3.2a0 (py3k:81084:81085M, May 12 2010, 14:16:52)
[GCC 4.2.1 (Apple Inc. build 5659)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
from difflib import unified_diff [47745 refs] list(unified_diff(list(open('1.txt')), list(open('2.txt')))) ['--- \n', '+++ \n', '@@ -1,2 +1,2 @@\n', ' one\n', '-two', '+two\n'] [53249 refs]
It looks to me as though the diff picks up the missing newline just fine.
The one problem with the above is that you can't do a ''.join() on it to give a meaningful diff, but I don't see that as a problem with the unified_diff function itself.
I'd be -1 on adding the "\ No newline at end of file" by default, since it complicates the unified_diff format unnecessarily (and would also affect backwards compatibility). I wouldn't have any objections to an extra option for this, though.