[Python-3000] PEP 3120 (Was: PEP Parade) (original) (raw)

Jim Jewett jimjjewett at gmail.com
Thu May 3 18:44:18 CEST 2007


On 5/3/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:

Untangling the parser from stdio - sure. I also think it would be desirable to read the whole source into a buffer, rather than applying a line-by-line input. That might be a bigger change, making the tokenizer a multi-stage algorithm:

1. read input into a buffer 2. determine source encoding (looking at a BOM, else a declaration within the first two lines, else default to UTF-8) 3. if the source encoding is not UTF-8, pass it through a codec (decode to string, encode to UTF-8). Otherwise, check that all bytes are really well-formed UTF-8. 4. start parsing

So people could hook into their own "codec" that, say, replaced native language keywords with standard python keywords?

Part of me says that should be an import hook instead of pretending to be a codec...

-jJ



More information about the Python-3000 mailing list