Issue 8478: tokenize.untokenize first token missing failure case (original) (raw)

When altering tokens and thus not providing token location information, tokenize.untokenize sometimes misses out the first token. Failure case below.

Expected output: 'import foo ,bar\n' Actual output: 'foo ,bar\n'

$ python Python 2.6.4 (r264:75706, Dec 7 2009, 18:43:55) [GCC 4.4.1] on linux2 Type "help", "copyright", "credits" or "license" for more information.

import StringIO, tokenize

def strip(iterable): ... for t_type, t_str, (srow, scol), (erow, ecol), line in iterable: ... yield t_type, t_str ... source = StringIO.StringIO('import foo, bar\n') print repr(tokenize.untokenize(strip(tokenize.generate_tokens(source.readline)))) 'foo ,bar \n' source.seek(0) print repr(tokenize.untokenize(tokenize.generate_tokens(source.readline))) 'import foo, bar\n'

I've looked into this in some more depth.

The problem is that Untokenizer.compat is assuming that iterable can restart from the beginning, when Untokenizer.untokenize has already had the first element out. So it works with a list, but not with a generator.

In particular, untokenize is broken for any input that is a generator which only supplies the first two elements.

Workaround: never hand untokenize a generator. Expand generators to lists first instead.