(original) (raw)

Hi All,


I was going through some of the open issues related to 'tokenize' and ran across 'issue2180'. �The reproduction case for this issue is along the lines of:


>>> tokenize.tokenize(io.StringIO("if 1:\n \\\n #hey\n print 1").readline)


but, with 'py3k' I get:

�� �>>> tokenize.tokenize(io.StringIO("if 1:\\n �\\\\\\n �#hey\\n �print 1").readline)
�� �Traceback (most recent call last):
�� � �File "<stdin>", line 1, in <module>
�� � �File "/Users/minge/Code/python/py3k/Lib/tokenize.py", line 360, in tokenize
�� � � �encoding, consumed = detect\_encoding(readline)
�� � �File "/Users/minge/Code/python/py3k/Lib/tokenize.py", line 316, in detect\_encoding
�� � � �if first.startswith(BOM\_UTF8):
�� �TypeError: Can't convert 'bytes' object to str implicitly

which, as seen in the trace, is because the 'detect\_encoding' function in 'Lib/tokenize.py' searches for 'BOM\_UTF8' (a 'bytes' object) in the string to tokenize 'first' (a 'str' object). �It seems to me that strings should still be able to be tokenized, but maybe I am missing something.

Is the implementation of 'detect\_encoding' correct in how it attempts to determine an encoding or should I open an issue for this?

---
Meador