[Python-Dev] Bytes path support (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Sat Aug 23 12:14:47 CEST 2014


Oleg Broytman writes:

This is the core of the problem. Python2 favors Unix model but Windows people pays the price. Python3 reverses that

This is certainly not true. What is true is that Python 3 makes no attempt to make it easy to write crappy software in the old Unix style, that breaks when unexpected character encoding are encountered. Python 3 is designed to make it easier to write reliable software, even if it will only ever be used on one platform. Nevertheless, it's still a reasonable language for writing byte-shoveling software, with the last piece in place as of the acceptance of PEP 461.

As of that PEP, you can use regexps for tokenizing byte streams and %-formatting to conveniently produce them. If you want to treat them piecewise as character streams with different encodings, you have a large library of codecs, which provide an incremental decoder interface. While AFAIK no codec implements a decode-until-error mode, that's not all that much of a loss, as many encodings overlap. Eg, if you start decoding using a latin-1 codec, decoding the whole document will succeed, even if it switches to windows-1251 in the meantime.

Oleg, I gather Russian is your native language. That's moderately complicated, I admit. But the Russians are a distant second to the Japanese in self-destructive proliferation of incompatible character coding standards and non-standard variants. After 24 years of dealing with the mess that is East Asian encodings (which is even bound up with the "religion" of Japanese exceptionalism -- some Japanese have argued that there is a spiritual superiority to Japanese JIS codes!), I cannot believe you are going to find a better environment for dealing with these issues than Python 3.



More information about the Python-Dev mailing list