Issue 12587: tokenize_tests-utf8-coding-cookie-and-no-utf8-bom-sig.txt has a UTF8 BOM signature

Created on 2011-07-19 21:18 by nneonneo, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue12587.patch	nneonneo,2011-07-19 22:41	Patch to fix the issue	review

Messages (6)
msg140694 - (view)	Author: Robert Xiao (nneonneo) *	Date: 2011-07-19 21:18
From a fresh Python3.2.1 tarball: nneonneo@nneonneo-mbp:~/devel/Python-3.Lib/test$ for i in tokenize_tests-; do echo i;xxdi; xxd i;xxdi \| head -n 1; done tokenize_tests-latin1-coding-cookie-and-utf8-bom-sig.txt 0000000: efbb bf23 202d 2a2d 2063 6f64 696e 673a ...# -- coding: tokenize_tests-no-coding-cookie-and-utf8-bom-sig-only.txt 0000000: efbb bf23 2049 4d50 4f52 5441 4e54 3a20 ...# IMPORTANT: tokenize_tests-utf8-coding-cookie-and-no-utf8-bom-sig.txt 0000000: efbb bf23 202d 2a2d 2063 6f64 696e 673a ...# -- coding: tokenize_tests-utf8-coding-cookie-and-utf8-bom-sig.txt 0000000: efbb bf23 202d 2a2d 2063 6f64 696e 673a ...# -- coding: From this, it appears that the file called "tokenize_tests-utf8-coding-cookie-and-no-utf8-bom-sig.txt" actually has a UTF-8 BOM signature, which means either the comment is lying or the BOM was accidentally added to the test file at some point.
msg140699 - (view)	Author: Ned Deily (ned.deily) *	Date: 2011-07-19 22:04
It looks like a BOM has been present in that file for a long time: it is there in the Python 3.0 source tarball, and, according to the converted svn-to-hg history, it was there in its original check-in and is still there in the current development tip.
msg140702 - (view)	Author: Robert Xiao (nneonneo) *	Date: 2011-07-19 22:34
Yes, it seems that way. Then the question is: why does the comment claim that it doesn't have a BOM? Also, test_tokenize.py is wrong around line 651: def test_utf8_coding_cookie_and_no_utf8_bom(self): f = 'tokenize_tests-utf8-coding-cookie-and-utf8-bom-sig.txt' self.assertTrue(self._testFile(f)) It reads the wrong file in this case, judging by the testcase name. (This makes it a duplicate of the test_utf8_coding_cookie_and_utf8_bom case)
msg140704 - (view)	Author: Robert Xiao (nneonneo) *	Date: 2011-07-19 22:41
Attached is a patch which fixes this. Python 3.2.1 still passes the test after applying the patch, as expected.
msg140707 - (view)	Author: Roundup Robot (python-dev)	Date: 2011-07-19 23:19
New changeset 0c254698e0ed by Ned Deily in branch '3.2': Issue #12587: Correct faulty test file and reference in test_tokenize. http://hg.python.org/cpython/rev/0c254698e0ed New changeset c1d2b6b337c5 by Ned Deily in branch 'default': Issue #12587: Correct faulty test file and reference in test_tokenize. http://hg.python.org/cpython/rev/c1d2b6b337c5
msg140709 - (view)	Author: Ned Deily (ned.deily) *	Date: 2011-07-19 23:21
Thanks for the report and the patch! Applied to 3.2 (for 3.2.2) and default (for 3.3).

History
Date	User	Action	Args
2022-04-11 14:57:19	admin	set	github: 56796
2011-07-19 23:21:48	ned.deily	set	status: open -> closedmessages: + assignee: ned.deilyresolution: fixedstage: needs patch -> resolved
2011-07-19 23:19:07	python-dev	set	nosy: + python-devmessages: +
2011-07-19 22:41:18	nneonneo	set	files: + issue12587.patchkeywords: + patchmessages: +
2011-07-19 22:34:01	nneonneo	set	messages: +
2011-07-19 22:04:35	ned.deily	set	versions: + Python 3.3nosy: + trent, ned.deilymessages: + stage: needs patch
2011-07-19 21🔞59	nneonneo	create