Issue 16224: tokenize.untokenize() misbehaves when moved to "compatiblity mode" (original) (raw)

Issue16224

Created on 2012-10-14 06:04 by eric.snow, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
untokenize_compat_force_iter.diff eric.snow,2012-10-14 06:18 review
Messages (4)
msg172851 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2012-10-14 06:04
When tokenize.untokenize() encounters a 2-tuple, it moves to compatibility mode, where only the token type and string are used from that point forward. There are two closely related problems: * when the iterable is a sequence, the portion of the sequence prior to the 2-tuple is traversed a second time under compatibility mode. * when the iterable is an iterator, the first 2-tuple encountered is essentially gobbled up (see ). Either an explicit "iterable = iter(iterable)" or "iterable = list(iterable)" should happen at the very beginning of Untokenizer.untokenize(). If the former, Untokenizer.compat() should be fixed to not treat that first token differently. If the latter, self.tokens should be cleared at the beginning of Untokenizer.compat(). I'll put up a patch with the second option when I get a chance.
msg172853 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2012-10-14 06:18
Actually, here's a patch with the first option. It preserves iterators as iterators, rather than dumping them into a list. I've also rolled the tests from into this patch. Consequently, if the patch is suitable, that issue can be closed.
msg180587 - (view) Author: Thomas Kluyver (takluyver) * Date: 2013-01-25 14:20
I think this is a duplicate of #8478.
msg211469 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-02-18 01:49
While I am closing this as a duplicate, I will use some of your patch, including one test, and credit you as well. Switching from 5-tuples to 2-tuples, as in one of your test cases, is not currently a supported use case, Compat currently re-iterates the entire token list and that does not work if some tokens have already been processed. While iter(iterable) makes your toy example pass, switching still does not work because of the problem of initializing compat. indents = [] This could only work with switching by making it a instance attribute which is also updated in the 5-tuple case. It is needed in tokenize also to support tab indents (#20383) but would only need to be an attribute instead of a local to support switching. startline = token[0] in (NEWLINE, NL) (my replacement for 3 lines) This is odd as the the file starts at the start of a line whether or not the first token is \n. On the other hand, the initial value of startline is irrelevant as long as it has some value because it is irrelevant until there has been an indent. It would also have to become an attribute to support switching and then it would be relevant since indents might not be initially empty. But I do not currently see the need for a tuple length switching feature. prevstring = False This does not matter even if wrong since it only means adding a space.
History
Date User Action Args
2022-04-11 14:57:37 admin set github: 60428
2014-02-18 01:49:52 terry.reedy set status: open -> closedassignee: eric.snow -> terry.reedyversions: - Python 3.2nosy: + terry.reedymessages: + resolution: duplicate
2013-01-25 14:20:37 takluyver set nosy: + takluyvermessages: +
2012-10-14 06:20:58 eric.snow link issue16221 superseder
2012-10-14 06🔞57 eric.snow set files: + untokenize_compat_force_iter.diffkeywords: + patchmessages: + stage: test needed -> patch review
2012-10-14 06:04:53 eric.snow create