Issue 898757: Python 2.3 encoding parsing bug (original) (raw)
Issue898757
Created on 2004-02-17 14:36 by edream, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (4) | ||
---|---|---|
msg20020 - (view) | Author: Edward K. Ream (edream) | Date: 2004-02-17 14:36 |
The documentation for encoding lines at C:\Python23\Doc\Python-Docs-2.3.1\whatsnew\section- encodings.html states: "Encodings are declared by including a specially formatted comment in the first or second line of the source file." In fact, contrary to the implication, the Python 2.3 parser does not look for lines of the form: # -*- coding: -*- For example, Python improperly scans the following line for an encoding #@+leo-ver=4-encoding=iso-8859-1. and reports that iso-8859-1. (note trailing dot) is an invalid encoding! The workaround for my app is to precede this line with the following line: # -*- coding: iso-8859-1 -*- This makes Python 2.3 happy. To make myself perfectly clear: Python has absolutely no right to complain about comment lines that do not have the form: # -*- coding: -*- Python 2.3.1 Windows XP Edward K. Ream edreamleo@charter.net | ||
msg20021 - (view) | Author: Marc-Andre Lemburg (lemburg) * ![]() |
Date: 2004-02-17 21:14 |
Logged In: YES user_id=38388 Python is behaving correctly and according to the PEP. The encoding declaration parser will look for "coding[:=][ \t]*" to make it play nice with various different editor encoding comments in use today. The format you are quoting is Emacs-style, but there are also vi-style and various other formats. Most of them use the "coding[:=]" declaration which is why this parsing method was chosen. Does leo need the trailing dot in the comment ? | ||
msg20022 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2004-02-17 21:47 |
Logged In: YES user_id=21627 Actually, what Python should (and does) really do is to follow the language specification (the PEP becomes irrelevant once implemented): http://www.python.org/doc/current/ref/encodings.html This gives the precise regexp that is used. Differences between the language spec and the implementation would be considered as a bug. Closing this report as not-a-bug. | ||
msg20023 - (view) | Author: Edward K. Ream (edream) | Date: 2004-02-17 22:59 |
Logged In: YES user_id=14056 > Does leo need the trailing dot in the comment? In general, Leo needs to know where the encoding specification ends and a possible end-block-comment delim begin. In specific languages, and in particular Python, Leo would not have needed the trailing dot. Alas, this is a moot point. The only options available to Leo now are: 1. Have the user insert encoding comments by hand or 2. Change the format of files created by Leo. In other words, no previous 4.x version of Leo (including 4.1 final, due tomorrow) can ever work with Python 2.3 without the user inserting a workaround. I am most upset that the Pep said one thing in English and something almost completely different in the re. Furthermore, what the re implies is a very bad idea: having a _restricted_ kind of special-purpose comment is one thing: having a way- too-general kind of special-purpose comment is wrong, wrong, wrong. It needlessly invalidates comments that _should_ have been none of Python's business. Yes, I know there was a reason for this bad idea; there always is. Edward |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:02 | admin | set | github: 39944 |
2004-02-17 14:36:28 | edream | create |