[Python-Dev] issue2180 and using 'tokenize' with Python 3 'str's (original) (raw)
Meador Inge meadori at gmail.com
Tue Sep 28 05:15:48 CEST 2010
- Previous message: [Python-Dev] Pronouncement needed in issue9675
- Next message: [Python-Dev] issue2180 and using 'tokenize' with Python 3 'str's
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi All,
I was going through some of the open issues related to 'tokenize' and ran across 'issue2180'. The reproduction case for this issue is along the lines of:
tokenize.tokenize(io.StringIO("if 1:\n \\n #hey\n print 1").readline)
but, with 'py3k' I get:
>>> tokenize.tokenize(io.StringIO("if 1:\n \\\n #hey\n print
1").readline) Traceback (most recent call last): File "", line 1, in File "/Users/minge/Code/python/py3k/Lib/tokenize.py", line 360, in tokenize encoding, consumed = detect_encoding(readline) File "/Users/minge/Code/python/py3k/Lib/tokenize.py", line 316, in detect_encoding if first.startswith(BOM_UTF8): TypeError: Can't convert 'bytes' object to str implicitly
which, as seen in the trace, is because the 'detect_encoding' function in 'Lib/tokenize.py' searches for 'BOM_UTF8' (a 'bytes' object) in the string to tokenize 'first' (a 'str' object). It seems to me that strings should still be able to be tokenized, but maybe I am missing something.
Is the implementation of 'detect_encoding' correct in how it attempts to determine an encoding or should I open an issue for this?
Meador -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20100927/c154b63b/attachment.html>
- Previous message: [Python-Dev] Pronouncement needed in issue9675
- Next message: [Python-Dev] issue2180 and using 'tokenize' with Python 3 'str's
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]