[Python-Dev] Divorcing str and unicode (no more implicitconversions). (original) (raw)
M.-A. Lemburg mal at egenix.com
Tue Oct 25 13:31:50 CEST 2005
- Previous message: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
- Next message: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Fredrik Lundh wrote:
M.-A. Lemburg wrote:
I don't follow you here. The source code encoding is only applied to Unicode literals (you are using string literals in your example). String literals are passed through as-is. however, for Python 3000, it would be nice if the source-code encoding applied to the entire file (XML-style), rather than just unicode string literals and (hope- fully) comments and docstrings.
Actually, the encoding is applied to the complete source file: the file is transcoded into UTF-8 and then parsed by the Python parser.
Unicode literals are then decoded from the UTF-8 into Unicode. String literals are transcoded back into the source code encoding, thus making the (rather long due to technical constraints) round-trip source code encoding -> Unicode -> UTF-8 -> Unicode -> source code encoding.
Python 3k should have a fully Unicode based parser to reduce this additional transcoding overhead.
Since Py3k will only have Unicode literals, the problems with string literals will go away all by themselves :-)
-- Marc-Andre Lemburg eGenix.com
Professional Python Services directly from the Source (#1, Oct 25 2005)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
- Previous message: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
- Next message: [Python-Dev] Divorcing str and unicode (no more implicitconversions).
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]