Issue 10117: Tools/scripts/reindent.py fails on non-UTF-8 encodings (original) (raw)

Created on 2010-10-15 16:54 by belopolsky, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
reindent.diff belopolsky,2010-10-15 16:54 review
reindent_coding.py vstinner,2011-07-07 23:25 review
Messages (13)
msg118804 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-15 16:54
Tools/scripts/reindent.py -d Lib/test/encoded_modules/module_koi8_r.py Traceback (most recent call last): File "Tools/scripts/reindent.py", line 310, in main() File "Tools/scripts/reindent.py", line 93, in main check(arg) File "Tools/scripts/reindent.py", line 114, in check r = Reindenter(f) File "Tools/scripts/reindent.py", line 162, in __init__ self.raw = f.readlines() File "Lib/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xf0 in position 59: invalid continuation byte Attached patch fixes this issue.
msg118810 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-10-15 17:45
+1.
msg118812 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-10-15 17:53
LGTM.
msg119026 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-18 14:48
Committed in r85695. Leaving open to discuss whether anything can/should be done for the case when reindent acts as an stdin to stdout filter. Also, what is the policy on backporting Tools' bug fixes?
msg119276 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-10-21 11:44
When working as a filter, reindent should use sys.{stdin,stdout}.encoding (defaulting to sys.getdefaultencoding()) for reading and writing, respectively. Detecting encoding on streams is not worth it IMO. People can set PYTHONIOENCODING for baroque needs.
msg139967 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-07 10:50
> Leaving open to discuss whether anything can/should be done > for the case when reindent acts as an stdin sys.stdin.buffer and sys.stdout.buffer should be used with tokenize.detect_encoding(). We may read first stdin and write it into a BytesIO object to be able to rewind after detect_encoding. Something like: content = sys.stdin.buffer.read() raw = io.BytesIO(content) buffer = io.BufferedReader(raw) encoding, _ = detect_encoding(buffer.readline) buffer.seek(0) text = TextIOWrapper(buffer, encoding) # use text
msg140001 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-07 23:25
reindent_coding.py: patch fixing reindent.py when using pipes (stdin and stdout).
msg140003 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-07 23:43
This is a lot more code than what I’d have expected. What is your opinion on my previous message?
msg140005 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-07 23:47
> When working as a filter, reindent should use sys.{stdin,stdout}.encoding > (defaulting to sys.getdefaultencoding()) for reading and writing, > respectively. It just doesn't work: you cannot read a ISO-8859-1 file from UTF-8 (if your locale encoding is UTF-8).
msg140021 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-07-08 11:19
Even with PYTHONIOENCODING?
msg315607 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-04-22 11:15
I concur with Éric. Standard input and output are text streams in Python 3. The user can control their encoding by setting locale or PYTHONIOENCODING. I think this issue can be closed now unless somebody want to backport the fix to 2.7.
msg377111 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2020-09-18 12:04
Since there won't be a python 2.7 backport, should this issue be closed?
msg377114 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-09-18 12:58
> Committed in r85695. Leaving open to discuss whether anything can/should be done for the case when reindent acts as an stdin to stdout filter. Also, what is the policy on backporting Tools' bug fixes? This is the commit: commit 4a98e3b6d06e5477e5d62f18e85056cbb7253f98 Author: Alexander Belopolsky <alexander.belopolsky@gmail.com> Date: Mon Oct 18 14:43:38 2010 +0000 Issue #10117: Tools/scripts/reindent.py now accepts source files that use encoding other than ASCII or UTF-8. Source encoding is preserved when reindented code is written to a file. > Since there won't be a python 2.7 backport, should this issue be closed? Right, 2.7 branch is closed. I close the issue.
History
Date User Action Args
2022-04-11 14:57:07 admin set github: 54326
2020-09-18 12:58:23 vstinner set status: open -> closedresolution: fixedmessages: + stage: resolved
2020-09-18 12:04:38 iritkatriel set status: pending -> opennosy: + iritkatrielmessages: +
2018-04-22 11:15:28 serhiy.storchaka set status: open -> pendingmessages: +
2012-10-13 23:02:01 serhiy.storchaka set nosy: + serhiy.storchaka
2011-07-08 11:19:44 eric.araujo set messages: +
2011-07-07 23:47:27 vstinner set messages: +
2011-07-07 23:43:16 eric.araujo set messages: +
2011-07-07 23:25:02 vstinner set files: + reindent_coding.pymessages: + versions: + Python 3.3
2011-07-07 10:50:01 vstinner set nosy: + vstinnermessages: +
2010-10-21 11:44:18 eric.araujo set messages: +
2010-10-18 14:48:10 belopolsky set messages: +
2010-10-15 17:53:44 georg.brandl set nosy: + georg.brandlmessages: +
2010-10-15 17:45:26 eric.araujo set nosy: + eric.araujomessages: +
2010-10-15 16:56:41 belopolsky set nosy: + tim.peters, christian.heimes, flox
2010-10-15 16:54:39 belopolsky create