From a fresh Python3.2.1 tarball: nneonneo@nneonneo-mbp:~/devel/Python-3.Lib/test$ for i in tokenize_tests-*; do echo i;xxdi; xxd i;xxdi | head -n 1; done tokenize_tests-latin1-coding-cookie-and-utf8-bom-sig.txt 0000000: efbb bf23 202d 2a2d 2063 6f64 696e 673a ...# -*- coding: tokenize_tests-no-coding-cookie-and-utf8-bom-sig-only.txt 0000000: efbb bf23 2049 4d50 4f52 5441 4e54 3a20 ...# IMPORTANT: tokenize_tests-utf8-coding-cookie-and-no-utf8-bom-sig.txt 0000000: efbb bf23 202d 2a2d 2063 6f64 696e 673a ...# -*- coding: tokenize_tests-utf8-coding-cookie-and-utf8-bom-sig.txt 0000000: efbb bf23 202d 2a2d 2063 6f64 696e 673a ...# -*- coding: From this, it appears that the file called "tokenize_tests-utf8-coding-cookie-and-no-utf8-bom-sig.txt" actually has a UTF-8 BOM signature, which means either the comment is lying or the BOM was accidentally added to the test file at some point.
It looks like a BOM has been present in that file for a *long* time: it is there in the Python 3.0 source tarball, and, according to the converted svn-to-hg history, it was there in its original check-in and is still there in the current development tip.
Yes, it seems that way. Then the question is: why does the comment claim that it doesn't have a BOM? Also, test_tokenize.py is wrong around line 651: def test_utf8_coding_cookie_and_no_utf8_bom(self): f = 'tokenize_tests-utf8-coding-cookie-and-utf8-bom-sig.txt' self.assertTrue(self._testFile(f)) It reads the wrong file in this case, judging by the testcase name. (This makes it a duplicate of the test_utf8_coding_cookie_and_utf8_bom case)