Issue 12587: tokenize_tests-utf8-coding-cookie-and-no-utf8-bom-sig.txt has a UTF8 BOM signature (original) (raw)

Created on 2011-07-19 21:18 by nneonneo, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue12587.patch nneonneo,2011-07-19 22:41 Patch to fix the issue review
Messages (6)
msg140694 - (view) Author: Robert Xiao (nneonneo) * Date: 2011-07-19 21:18
From a fresh Python3.2.1 tarball: nneonneo@nneonneo-mbp:~/devel/Python-3.Lib/test$ for i in tokenize_tests-*; do echo i;xxdi; xxd i;xxdi | head -n 1; done tokenize_tests-latin1-coding-cookie-and-utf8-bom-sig.txt 0000000: efbb bf23 202d 2a2d 2063 6f64 696e 673a ...# -*- coding: tokenize_tests-no-coding-cookie-and-utf8-bom-sig-only.txt 0000000: efbb bf23 2049 4d50 4f52 5441 4e54 3a20 ...# IMPORTANT: tokenize_tests-utf8-coding-cookie-and-no-utf8-bom-sig.txt 0000000: efbb bf23 202d 2a2d 2063 6f64 696e 673a ...# -*- coding: tokenize_tests-utf8-coding-cookie-and-utf8-bom-sig.txt 0000000: efbb bf23 202d 2a2d 2063 6f64 696e 673a ...# -*- coding: From this, it appears that the file called "tokenize_tests-utf8-coding-cookie-and-no-utf8-bom-sig.txt" actually has a UTF-8 BOM signature, which means either the comment is lying or the BOM was accidentally added to the test file at some point.
msg140699 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-07-19 22:04
It looks like a BOM has been present in that file for a *long* time: it is there in the Python 3.0 source tarball, and, according to the converted svn-to-hg history, it was there in its original check-in and is still there in the current development tip.
msg140702 - (view) Author: Robert Xiao (nneonneo) * Date: 2011-07-19 22:34
Yes, it seems that way. Then the question is: why does the comment claim that it doesn't have a BOM? Also, test_tokenize.py is wrong around line 651: def test_utf8_coding_cookie_and_no_utf8_bom(self): f = 'tokenize_tests-utf8-coding-cookie-and-utf8-bom-sig.txt' self.assertTrue(self._testFile(f)) It reads the wrong file in this case, judging by the testcase name. (This makes it a duplicate of the test_utf8_coding_cookie_and_utf8_bom case)
msg140704 - (view) Author: Robert Xiao (nneonneo) * Date: 2011-07-19 22:41
Attached is a patch which fixes this. Python 3.2.1 still passes the test after applying the patch, as expected.
msg140707 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-07-19 23:19
New changeset 0c254698e0ed by Ned Deily in branch '3.2': Issue #12587: Correct faulty test file and reference in test_tokenize. http://hg.python.org/cpython/rev/0c254698e0ed New changeset c1d2b6b337c5 by Ned Deily in branch 'default': Issue #12587: Correct faulty test file and reference in test_tokenize. http://hg.python.org/cpython/rev/c1d2b6b337c5
msg140709 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-07-19 23:21
Thanks for the report and the patch! Applied to 3.2 (for 3.2.2) and default (for 3.3).
History
Date User Action Args
2022-04-11 14:57:19 admin set github: 56796
2011-07-19 23:21:48 ned.deily set status: open -> closedmessages: + assignee: ned.deilyresolution: fixedstage: needs patch -> resolved
2011-07-19 23:19:07 python-dev set nosy: + python-devmessages: +
2011-07-19 22:41:18 nneonneo set files: + issue12587.patchkeywords: + patchmessages: +
2011-07-19 22:34:01 nneonneo set messages: +
2011-07-19 22:04:35 ned.deily set versions: + Python 3.3nosy: + trent, ned.deilymessages: + stage: needs patch
2011-07-19 21🔞59 nneonneo create