Issue 1224621: tokenize module does not detect inconsistent dedents (original) (raw)

Created on 2005-06-21 06:10 by dyoo, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
tokenize.py.diff	dyoo,2005-06-21 06:10	Diff to correct non-detection of incorrect dedent bug
testcase.py	dyoo,2005-06-21 06:10	test case to expose tokenize bug
breaking-getsource.py	arigo,2005-09-02 12:10	inspect.getsource() breaks now
patch.tokenize	arigo,2005-09-02 12:40	diff with test to fix the inspect.getsource() case

Messages (6)
msg25595 - (view)	Author: Danny Yoo (dyoo)	Date: 2005-06-21 06:10
The attached code snippet 'testcase.py' should produce an IndentationError, but does not. The code in tokenize.py is too trusting, and needs to add a check against bad indentation as it yields DEDENT tokens. I'm including a diff to tokenize.py that should at least raise an exception on bad indentation like this. Just in case, I'm including testcase.py here too: ------ import tokenize from StringIO import StringIO sampleBadText = """ def foo(): bar baz """ print list(tokenize.generate_tokens( StringIO(sampleBadText).readline))
msg25596 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2005-06-21 07:54
Logged In: YES user_id=80475 Fixed. See Lib/tokenize.py 1.38 and 1.36.4.1
msg25597 - (view)	Author: Armin Rigo (arigo) *	Date: 2005-09-02 12:10
Logged In: YES user_id=4771 Reopening this bug report: this might fix the problem at hand, but it breaks inspect.getsource() on cases where it used to work. See attached example.
msg25598 - (view)	Author: Armin Rigo (arigo) *	Date: 2005-09-02 12:40
Logged In: YES user_id=4771 Here is a proposed patch. It relaxes the dedent policy a bit. It assumes that the first line may already have some initial indentation, as is the case when tokenizing from the middle of a file (as inspect.getsource() does). It should also be back-ported to 2.4, given that the previous patch was. For 2.4, only the non-test part of the patch applies cleanly; I suggest to ignore the test part and just apply it, given that there are much more tests in 2.5 for inspect.getsource() anyway. The whole issue of inspect.getsource() being muddy anyway, I will go ahead and check this patch in unless someone spots a problem. For now the previously-applied patch makes parts of PyPy break with an uncaught IndentationError.
msg25599 - (view)	Author: Kurt B. Kaiser (kbk) *	Date: 2006-08-10 01:40
Logged In: YES user_id=149084 Tokenize Rev 39046 21Jun05 breaks tabnanny. tabnanny doesn't handle the IndentationError exception when tokenize detects a dedent. I patched up ScriptBinding.py in IDLE. The IndentationError probably should pass the same parms as TokenError and tabnanny should catch it.
msg25600 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2006-08-14 21:34
Logged In: YES user_id=849994 tabnanny's been taken care of in r51284.

History
Date	User	Action	Args
2022-04-11 14:56:11	admin	set	github: 42104
2005-06-21 06:10:00	dyoo	create