Issue 1379393: StreamReader.readline doesn't advance on decode errors (original) (raw)

Created on 2005-12-13 10:35 by donut, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python-test_streamreader.py donut,2005-12-13 10:35 script to demonstrate the problem
skipbadlines.py doerwalter,2005-12-16 17:25
Messages (4)
msg27059 - (view) Author: Matthew Mueller (donut) Date: 2005-12-13 10:35
In previous versions of python, when there was a unicode decode error, StreamReader.readline() would advance to the next line. In the current version(2.4.2 and trunk), it doesn't. Testing under Linux AMD64 (Ubuntu 5.10) Attaching an example script. In python2.3 it prints: hi~ hi error: 'utf8' codec can't decode byte 0x80 in position 2: unexpected code byte error: 'utf8' codec can't decode byte 0x81 in position 2: unexpected code byte all done In python2.4 and trunk it prints: hi~ hi error: 'utf8' codec can't decode byte 0x80 in position 0: unexpected code byte error: 'utf8' codec can't decode byte 0x80 in position 0: unexpected code byte error: 'utf8' codec can't decode byte 0x80 in position 0: unexpected code byte [repeats forever] Maybe this isn't actually supposed to work (the docs don't mention what should happen with strict error checking..), but it would be nice, given the alternatives: 1. use errors='replace' and then search the result for the replacement character. (ick) 2. define a custom error handler similar to ignore or replace, that also sets some flag. (slightly less ick, but more work.)
msg27060 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2005-12-15 21:42
Logged In: YES user_id=1188172 I don't know what should be correct. Walter?
msg27061 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2005-12-16 17:25
Logged In: YES user_id=89016 IMHO the current behaviour is more consistent. To read the broken utf-8 stream from the test script the appropriate error handler should be used. What is the desired outcome? If only the broken byte sequence should be skipped errors="replace" is appropriate. To skip a complete line that contains a broken byte sequence do something like in the attached skipbadlines.py. The StreamReader can't know which behaviour is wanted.
msg27062 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-02-19 00:58
Logged In: YES user_id=1188172 Closing as Won't Fix, then.
History
Date User Action Args
2022-04-11 14:56:14 admin set github: 42686
2005-12-13 10:35:47 donut create