[Python-Dev] issue2180 and using 'tokenize' with Python 3 'str's (original) (raw)

Michael Foord fuzzyman at voidspace.org.uk
Wed Sep 29 02:08:56 CEST 2010

Previous message: [Python-Dev] issue2180 and using 'tokenize' with Python 3 'str's
Next message: [Python-Dev] issue2180 and using 'tokenize' with Python 3 'str's
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 29 Sep 2010, at 00:22, "Martin v. Löwis" <martin at v.loewis.de> wrote:

I certainly wouldn't be opposed to an API that accepts a string as well though. Notice that this can't really work for Python 2 source code (but of course, it doesn't need to). In Python 2, if you have a string literal in the source code, you need to know the source encoding in order to get the bytes back. Now, if you parse a Unicode string as source code, and it contains byte string literals, you wouldn't know what encoding to apply. Fortunately, Python 3 byte literals ban non-ASCII literal characters, so assuming an ASCII-compatible encoding for the original source is fairly safe.

The new API couldn't be ported to Python 2 •anyway•. As Nick pointed out, the underlying tokenization happens on decoded strings - so starting with source as Unicode will be fine.

Michael

Regards, Martin

Previous message: [Python-Dev] issue2180 and using 'tokenize' with Python 3 'str's
Next message: [Python-Dev] issue2180 and using 'tokenize' with Python 3 'str's
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list