[Python-Dev] status of development documentation (original) (raw)

Brett Cannon bcannon at gmail.com
Sun Dec 25 07:29:36 CET 2005


On 12/24/05, Tim Peters <tim.peters at gmail.com> wrote:

[Tim] >> FWIW, testbuiltin and testpep263 both passed on WinXP in rev 39757. >> That's the last revision before the AST branch was merged. >> >> I can't build rev 39758 on WinXP (VC complains that pythoncore.vcproj >> can't be loaded -- looks like it got checked in with unresolved SVN >> conflict markers -- which isn't easy to do under SVN ;-( ), so don't >> know about that. >> >> The first revision at which Python built again was 39791 (23 Oct), and >> testbuiltin and testpep263 both fail under that the same way they >> fail today.

[Brett] > Both syntax errors, right? In testbuiltin, yes, two syntax errors. testpep263 is different: test testpep263 failed -- Traceback (most recent call last): File "C:\Code\python\lib\test\testpep263.py", line 12, in testpep263 '\xd0\x9f\xd0\xb8\xd1\x82\xd0\xbe\xd0\xbd' AssertionError: '\xc3\xb0\xc3\x89\xc3\x94\xc3\x8f\xc3\x8e' != '\xd0\x9f\xd0\xb8\xd1\x82\xd0\xbe\xd0\xbd' That's not a syntax error, it's a wrong result. There are other parsing-related test failures, but those are the only two I've written up so far (partly because I expect they all have the same underlying cause, and partly because nobody so far seems to understand the code well enough to explain why the first one works on any platform ;-)). > My mind is partially gone thanks to being on vacation so following this thread > has been abnormally hard. =) > > Since it is a syntax error there won't be any bytecode to compare against. Shouldn't be needed. The snippet: bom = '\xef\xbb\xbf' compile(bom + 'print 1\n', '', 'exec') treats the bom prefix like any other sequence of illegal characters. That's why it raises SyntaxError: It peels off the first character (\xef), and says "syntax error" at that point: PyCompileStringFlags -> PyParserASTFromString -> PyParserParseStringFlagsFilename -> parsetok -> PyTokenizerGet That sets a to point at the start of the string, b to point at the second character, and returns type==51. Then len is set to 1, str is malloc'ed to hold 2 bytes, and str is filled in with "\xef\x00" (the first byte of the input, as a NUL-terminated C string). PyParserAddToken then calls classify(), which falls off the end of its last loop and returns -1: syntax error. and later: I'm getting a strong suspicion that I'm the only developer to try building the trunk on WinXP since the AST merge was done, and that something obscure is fundamentally broken with it on this box. For example, in tokenizer.c, these functions don't even exist on Windows today (because an enclosing #ifdef says not to compile them): errorret newstring getnormalname getcodingspec checkcodingspec checkbom fpreadl fpsetreadl fpgetc fpungetc decodingfgets decodingfeof bufgetc bufungetc bufsetreadl translateintoutf8 decodestr OK, that's not quite true. "Degenerate" forms of three of those functions exist on Windows: static char * decodingfgets(char *s, int size, struct tokstate *tok) { return fgets(s, size, tok->fp); } static int decodingfeof(struct tokstate *tok) { return feof(tok->fp); } static const char * decodestr(const char *str, struct tokstate *tok) { return str; } In the simple failing test, that degenerate decodestr() is getting called. If the "fancy" decodestr() were being used instead, that one does call checkbom(). Why do we have two versions of these functions? Which set is supposed to be in use now? What's the meaning of "#ifdef PGEN" today? Should it be true or false?

Looking at the logs for tokenizer.c, tokenizer.h, and tokenizer_pgen.c, it looks like this stuff has not been heavily touched since Martin did stuff for PEP 263.

>> I'm darned near certain that we're not using the intended parsing >> code on Windows now -- PGEN is still #define'd when the "final" >> parsing code is compiled into python25.dll. Don't know how to fix >> that (I don't understand it).

> But the AST branch didn't touch the parser (unless you are grouping > ast.c and compile.c under the "parser" umbrella just to throw me off > =). Possibly. See above for unanswered questions about tokenizer.c, which appears to be the whole problem wrt testbuiltin. Python couldn't be built under VC7.1 on Windows after the AST merge. However that got repaired left parsing/tokenizing broken on Windows wrt (at least) some encoding gimmicks. Since the tests passed immediately before the AST merge, and failed the first time Python could be built again after that merge, it's the only natural candidate for finger-wagging.

Did it lead to tokenizer_pgen.c to suddenly be used for the build instead of tokenizer.c? The former seems to be the only place where PGEN is defined.

> What can I do to help?

I don't know. Enjoying Christmas couldn't hurt :-) What this needs is someone who understands how bom = '\xef\xbb\xbf' compile(bom + 'print 1\n', '', 'exec') is supposed to work at the front-end level.

Hopefully Martin will have some inkling since he committed the phase 1 stuff for PEP 263.

> Do you need me to step through something?

Why doesn't the little code snippet above fail anywhere else? "Should" the degenerate decodestr() be getting called during it -- or should the other decodestr() be getting called? If the latter, what got broke on Windows during the merge so that the wrong one is getting called now?

> Do you need to know how gcc is preprocessing some file?

No, I just need to know how to fix Python on Windows ;-)

=)

-Brett



More information about the Python-Dev mailing list