[Python-3000] New io system and binary data (original) (raw)

Bill Janssen janssen at parc.com
Wed Sep 19 19:56:38 CEST 2007


GvR wrote:

I wouldn't do the assignments you propose though, since that might surprise other code which expects text files.

But presumably that code wouldn't be used in that same program.

This really isn't a UTF-8 problem. It is the problem with file opens defaulting to "text" mode instead of "binary" mode rearing its ugly head again.

Bill

Changing the mode between text and binary is not feasible (since it would have to change the class). But it is perfectly acceptable to use sys.std{in,out}.buffer if you need to write a binary transparent filter. Of course you'll be dealing with bytes at that point so the usual cautions apply. I wouldn't do the assignments you propose though, since that might surprise other code which expects text files.

--Guido On 9/19/07, Christian Heimes <lists at cheimes.de> wrote: > Today I stumbled over another problem that is related to the unicode and > OS string topic. The new io system - or to be more precisely the > implicit converting of input and output data to UTF-8 makes it > impossible to pipe binary data through Python 3.0. > > For example an user wants to write a filter for binary data like images > in Python. With Python 2.5 the input and output data isn't implicitly > converted: > > # stdredirect.py > # simple stupid example > import sys > sys.stdout.write(sys.stdin.read()) > > $ chmod 755 stdredict.py > $ cat ./Mac/Demo/html.icons/python.gif | python2.5 stdredirect.py >out.gif > $ diff ./Mac/Demo/html.icons/python.gif out.gif > > But Python 3.0 is using TextIOWrapper for stdin, stdout and stderr: > > $ cat ./Mac/Demo/html.icons/python.gif | ./python stdredirect.py > >out.gifTraceback (most recent call last): > File "./stdredict.py", line 4, in > sys.stdout.write(sys.stdin.read()) > File "/home/heimes/dev/python/py3k/Lib/io.py", line 1225, in read > res += decoder.decode(self.buffer.read(), True) > File "/home/heimes/dev/python/py3k/Lib/codecs.py", line 291, in decode > (result, consumed) = self.bufferdecode(data, self.errors, final) > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 10-13: > invalid data > > An easy workaround for the problem is: > > sys.stdout = sys.stdout.buffer > sys.stdin = sys.stdin.buffer > > I recommend that the problem and fix gets documented. Maybe stdin, > stdout and stderr should get a method that disables the implicit > conversion like setMode("b") / setMode("t").



More information about the Python-3000 mailing list