[Python-Dev] Divorcing str and unicode (no more implicitconversions). (original) (raw)

M.-A. Lemburg mal at egenix.com
Tue Oct 25 13:31:50 CEST 2005


Fredrik Lundh wrote:

M.-A. Lemburg wrote:

I don't follow you here. The source code encoding is only applied to Unicode literals (you are using string literals in your example). String literals are passed through as-is. however, for Python 3000, it would be nice if the source-code encoding applied to the entire file (XML-style), rather than just unicode string literals and (hope- fully) comments and docstrings.

Actually, the encoding is applied to the complete source file: the file is transcoded into UTF-8 and then parsed by the Python parser.

Unicode literals are then decoded from the UTF-8 into Unicode. String literals are transcoded back into the source code encoding, thus making the (rather long due to technical constraints) round-trip source code encoding -> Unicode -> UTF-8 -> Unicode -> source code encoding.

Python 3k should have a fully Unicode based parser to reduce this additional transcoding overhead.

Since Py3k will only have Unicode literals, the problems with string literals will go away all by themselves :-)

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Oct 25 2005)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::



More information about the Python-Dev mailing list