msg102294 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-04-03 21:00 |
I'm seeing a very peculiar test_pep263 failure when doing 'make test' on OS X 10.6.3. It's enough to run test___all__ and test_pep263, in that order: Mark-Dickinsons-MacBook-Pro:trunk dickinsm$ ./python.exe -Wd -3 -E -tt ./Lib/test/regrtest.py test___all__ test_pep263 test___all__ /Users/dickinsm/python/svn/trunk/Lib/test/test___all__.py:3: DeprecationWarning: in 3.x, the bsddb module has been removed; please use the pybsddb project instead import bsddb /Users/dickinsm/python/svn/trunk/Lib/bsddb/__init__.py:67: PendingDeprecationWarning: The CObject type is marked Pending Deprecation in Python 2.7. Please use capsule objects instead. import _bsddb test_pep263 test test_pep263 failed -- Traceback (most recent call last): File "/Users/dickinsm/python/svn/trunk/Lib/test/test_pep263.py", line 39, in test_issue7820 self.assertRaises(SyntaxError, eval, '\xff\x20') File "/Users/dickinsm/python/svn/trunk/Lib/unittest/case.py", line 444, in assertRaises callableObj(*args, **kwargs) File "", line 1, in NameError: name '?' is not defined 1 test OK. 1 test failed: test_pep263 [218378 refs] The failing test is expecting a SyntaxError, but gets a NameError instead. I've narrowed down the cause of the failure to a Tkinter import: Python 2.7a4+ (trunk:79716, Apr 3 2010, 20:30:09) [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> eval('\xff\x20') Traceback (most recent call last): File "", line 1, in File "", line 1 ? ^ SyntaxError: invalid syntax [35036 refs] >>> import Tkinter [51314 refs] >>> eval('\xff\x20') Traceback (most recent call last): File "", line 1, in File "", line 1, in NameError: name '?' is not defined [51324 refs] But I'm now mystified: why does the eval raise a SyntaxError before the import and a TypeError afterwards? |
|
|
msg102295 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-04-03 21:00 |
That should be "NameError" in the last line of the previous message, not "TypeError". |
|
|
msg102300 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2010-04-03 22:13 |
See also #8208. |
|
|
msg102301 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-04-03 22:13 |
After some more digging, it looks as though this is due to the Tkinter import (that ends up happening as a result of test___all__) changing the locale(?), and in particular the meaning of isalpha: Python 2.7a4+ (trunk:79716, Apr 3 2010, 22:06:18) [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> str.isalpha(chr(255)) False [34999 refs] >>> import Tkinter [51283 refs] >>> str.isalpha(chr(255)) True [51283 refs] (Is there some way that I can see the locale change more explicitly from Python?) |
|
|
msg102302 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-04-03 22:18 |
> (Is there some way that I can see the locale change more explicitly from Python?) Found it. :) >>> locale.nl_langinfo(locale.CODESET) 'US-ASCII' [40683 refs] >>> import Tkinter [56953 refs] >>> locale.nl_langinfo(locale.CODESET) 'UTF-8' [56953 refs] |
|
|
msg102303 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2010-04-03 22:23 |
Or this: Python 2.7a4+ (trunk, Apr 3 2010, 15🔞51) [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import locale >>> locale.getlocale() (None, None) >>> import Tkinter >>> locale.getlocale() ('en_US', 'UTF8') |
|
|
msg102304 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-04-03 22:25 |
I realize that the above doesn't really explain why the NameError is occurring: Python's token recognition algorithm, in tok_get in tokenizer.c, uses isalpha, which is locale-aware. In particular, it seems that chr(255) is considered alphabetic in the UTF-8 codeset, and not in ASCII. Should this instance of isalpha be replaced by something that's not locale aware? I'm not sure what the rules are supposed to be in 2.x for recognising identifiers. |
|
|
msg102305 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-04-03 22:26 |
Ned: yes, that works too. Thanks! |
|
|
msg102307 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2010-04-03 23:23 |
For the record, the problem here isn't new to trunk and is not limited to OS X 10.6; it's the test that's new. It's not a problem for py3k where, as expected, the locale is always set and it seems the tokenizer is a little smarter: Python 3.2a0 (py3k, Apr 3 2010, 16:02:28) [GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> eval(b'\xff\x20') Traceback (most recent call last): File "", line 1, in File "", line 1 � ^ SyntaxError: invalid character in identifier |
|
|
msg102309 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2010-04-04 00:00 |
Verified that r79725 fix to tokenizer.c prevents the original test failure. |
|
|
msg102328 - (view) |
Author: Mark Dickinson (mark.dickinson) *  |
Date: 2010-04-04 09:20 |
Yes, Benjamin's checkin seems to have fixed it for me, too. Thanks, Benjamin! There's still the issue of the Tkinter import changing the locale, but that seems to be out of Python's control. As far as I can tell, it happens when the module initialization calls Tcl_FindExecutable, which is part of the Tcl library itself. This may well be deliberate: see http://www.tcl.tk/cgi-bin/tct/tip/66.html Closing. |
|
|