[Python-ideas] changing sys.stdout encoding (original) (raw)

Rurpy rurpy at yahoo.com
Wed Jun 6 09:09:34 CEST 2012


On 06/05/2012 05:56 PM, MRAB wrote:

On 06/06/2012 00:34, Victor Stinner wrote:

2012/6/5 Rurpy<rurpy-/E1597aS9LQAvxtiuMwx3w at public.gmane.org>:

In my first foray into Python3 I've encountered this problem: I work in a multi-language environment. I've written a number of tools, mostly command-line, that generate output on stdout. Because these tools and their output are used by various people in varying environments, the tools all have an --encoding option to provide output that meets the needs and preferences of the output's ultimate consumers.

What happens if the specified encoding is different than the encoding of the console? Mojibake? If the output is used as in the input of another program, does the other program use the same encoding? In my experience, using an encoding different than the locale encoding for input/output (stdout, environment variables, command line arguments, etc.) causes various issues. So I'm curious of your use cases.

In converting them to Python3, I found the best (if not very pleasant) way to do this in Python3 was to put something like this near the top of each tool[*1]:

import codecs sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer) In Python 3, you should use io.TextIOWrapper instead of codecs.StreamWriter. It's more efficient and has less bugs. What I want to be able to put there instead is: sys.stdout.setencoding (opts.encoding) I don't think that your use case merit a new method on io.TextIOWrapper: replacing sys.stdout does work and should be used instead. TextIOWrapper is generic and your use case if specific to sys.std* streams. It would be surprising to change the encoding of an arbitrary file after it is opened. At least, I don't see the use case. [snip] And if you do want multiple encodings in a file, it's clearer to open the file as binary and then explicitly encode to bytes and write that to the file.

But is it really?

The following is very simple and the level of python expertise required is minimal. It (would) works fine with redirection. One could substitute any other ordinary open (for write) text file for sys.stdout.

[off the top of my head] text = 'This is %s text: 世界へ、こんにちは!' sys.stdout.set_encoding ('sjis') print (text % 'sjis') sys.stdout.set_encoding ('euc-jp') print (text % 'euc-jp') sys.stdout.set_encoding ('iso2022-jp') print (text % 'iso2022-jp')

As for your suggestion, how do I reopen sys.stdout in binary mode? I don't need to do that often and don't know off the top of my head. (And it's too late for me to look it up.) And what happens to redirected output when I close and reopen the stream? I can open a regular filename instead. But remember to make the last two opens with "a" rather than "w". And don't forget the "\n" at the end of the text line.

Could you show me an code example of your suggestion for comparison?

Disclaimer: As I said before, I am not particularly advocating for a for a set_encoding() method -- my primary suggestion is a programatic way to change the sys.std* encodings prior to first use. Here I am just questioning the claim that a set_encoding() method would not be clearer than existing alternatives.



More information about the Python-ideas mailing list