[Python-3000] locale-aware strings ? (original) (raw)

Paul Prescod paul at prescod.net
Wed Sep 6 19:21:33 CEST 2006


On 9/6/06, Oleg Broytmann <phd at oper.phd.pp.ru> wrote:

On Wed, Sep 06, 2006 at 03:55:04AM -0700, Paul Prescod wrote: These situations are caused because of the lack of metadata or clear encoding-friendly standards. Ogg, for example, is encoding friendly - it clearly states that tags (comments) must be in UTF-8, and all Ogg Vorbis files I have saw were really in UTF-8, and all tag editors and players write/use UTF-8.

Michael Urman disagrees with you. He says that he sometimes sees Latin-1 encoded files. Let's trace back how that could have happened.

  1. The end-user must have had Latin-1 as their system encoding.

  2. The programmer of the ID tagging app had not thought through encoding issues.

  3. The programming language either implicitly encoded the data according to the locale or treated it as binary data. (unless the programmer did this on purpose, which would imply that he was VERY confused and not just lazy)

I fail to see how Python can help here.

Python can refuse to be the programming language in Step 3 that guesses the appropriate encoding without consulting the programmer or end-user.

Paul Prescod



More information about the Python-3000 mailing list