[Python-ideas] changing sys.stdout encoding (original) (raw)

Rurpy rurpy at yahoo.com
Sat Jun 9 05:39:34 CEST 2012


On 06/07/2012 03:00 PM, Mike Meyer wrote:

On Thu, Jun 7, 2012 at 4:48 PM, Rurpy <rurpy-/E1597aS9LQAvxtiuMwx3w at public.gmane.org> wrote:

I suspect the vast majority of programmers are interested in a language that allows them to effectively get done what they need to, whether they are working of the latest agile TTD REST server, or modifying some legacy text files. Others have raised the question this begs to have answered: how do other programming languages deal with wanting to change the encoding of the standard IO streams? Can you show us how they do things that's so much easier than what Python does?

This is how it seems to be done in Perl:

binmode(STDOUT, ":encoding(sjis)");

which seems quite a bit simpler than Python. I don't know if it meets your "so much easier" criterion. A quick trial showed that it works as advertised when called before any output. The description of binmode() in "man perlfunc" sounds like encoding can be changed on-the-fly but my attempt to do so had no effect, so I don't know if I'm misinterpreting the text or wrote bad Perl code (haven't used it in ages and not interested in relearning it right now.)

TCL appears to have on-the-fly encoding changes:

| encoding system ?encoding? | Set the system encoding to encoding. If encoding is omitted | then the command returns the current system encoding. The system | encoding is used whenever Tcl passes strings to system calls. http://www.tcl.tk/man/tcl8.4/TclCmd/encoding.htm

I'll see if I can find out about some other languages if there continues to be any interest.

And even were I to accept your argument, Python is inconsistent: when I open a file explicitly there is only a slight penalty for opening a non-default-encoded file (the need the explicitly give an encoding): The proper encoding for the standard IO streams is generally a property of the environment, and hence is set in the environment.

"Proper encoding"? If you said, "Proper default encoding" I'd agree with you. And I'd buy your claim if no one had ever invented output redirection and if print output always went to a console with a (relatively) fixed encoding. But that is not the case.

You have a use case where that's not the case. The argument is that your use case isn't common enough to justify changing the standard library. Can you provide evidence to the contrary?

How exactly do you suggest one accurately quantify "commonness"? And what is the threshold for justification? It seems to me the strongest argument is the credibility one that I already made:

  1. Programs that accept data input on stdin and write data on stdout have a long history and are widely used. I hope this is self evident.

  2. Encodings other than utf-8 are widely used. I pointed to the commonness of non-utf8 encoding in Japanese web pages. Additionally, Google for "ftp readme の site:.jp" turns up lots of text files. Once past the first few pages of Google results (where the web pages are mostly utf8) hardly any utf8 files are to be found.

  3. An effect of globalization means that many more programmers today are dealing with files that have non-native encoding that come from or go to customers, vendors, partners and colleagues in other parts of the world. The number of encodings in wide use even within a single country (again Japan: utf8, sjis, euc-jp, iso202jp) implies pretty strongly that tools for use only in that region will often need multi-encoding capabilities.

I think connecting the dots above leads to a pretty high-probability conclusion.

Other languages that make setting the encoding on the standard streams easy, or applications outside of those built for your system that have a "--encoding" type flag?

iconv, recode and their ilk are obvious examples of applications.

I wasn't suggesting a change to the core level (if by that you mean to the interpreter). I was asking if some way could be provided that is easier and more reliable than googling around for a magic incantation) to change the encoding of one or more of the already-open-when-my-program-starts sys.std* streams. I presume that would be a standard library change (in either the io or sys modules) and offered a .setencoding() method as a placeholder for discussion. Why presume that this needs a change in the library? The method is straightforward, if somewhat ugly. Is there any reason it can't just be documented, instead of added to the library? Changing the library would require a similar documentation change.

Did you miss the paragraph right below the one you quote? The one in which I said,

An inferior and bare minimum way to address this would be to at least add a note about how to change the encoding to the sys.std* documentation. That encourages cargo-cult programming and doesn't address the WTF effect but it is at least better than the current state of affairs.



More information about the Python-ideas mailing list