Issue 30377: Unnecessary complexity in tokenize.py: comments and newlines (original) (raw)

Created on 2017-05-16 13:08 by Albert-Jan Nijburg, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 1607 merged python-dev,2017-05-16 13:21
Messages (9)
msg293760 - (view) Author: Albert-Jan Nijburg (Albert-Jan Nijburg) * Date: 2017-05-16 13:08
While porting tokenize.py to javascript I stumbled upon this. The bit of code that checks if it's a newline or a comment, checks for comment twice. These can be split up, this way the code is a bit more readable. https://github.com/python/cpython/blob/master/Lib/tokenize.py#L560 It's not broken, it's just a bit more complex then it has to be.
msg293770 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-16 16:08
The code is correct, it just can be made cleaner. No need to backport the change to other versions. The line "if line[pos] in '#\r\n':" looks a kind of optimization. In common case (not a newline and not a comment) there is only one check. The expression "(NL, COMMENT)[line[pos] == '#']" is redundant of course, it can be replaced by just "NL". Note that now the line that yields yield TokenInfo(NL, ...) is almost the same for comments and newlines. If rename nl_pos to pos the same line can be used in both cases.
msg293772 - (view) Author: Albert-Jan Nijburg (Albert-Jan Nijburg) * Date: 2017-05-16 16:27
Oh yes you're right! I've updated the code on github. Even cleaner this way :).
msg293841 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-17 06:04
This is simple change, and I would write the same code, but just for the case could you please sign CLA?
msg293844 - (view) Author: Albert-Jan Nijburg (Albert-Jan Nijburg) * Date: 2017-05-17 06:23
I did yesterday, should be coming through today right?
msg293845 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-17 06:35
Don't worry, CLA is accepted manually, and this takes some time.
msg293952 - (view) Author: Albert-Jan Nijburg (Albert-Jan Nijburg) * Date: 2017-05-19 08:14
Still no CLA, I checked my username on the pdf, and it's correct, hope someone looks at it soon :)
msg293973 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-05-19 20:17
This is pycon week, so staff person is busy with that.
msg294346 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-24 11:31
New changeset c471ca448cf336d7eb4a7cbe14d0012baf122d1f by Serhiy Storchaka (Albert-Jan Nijburg) in branch 'master': bpo-30377: Simplify handling of COMMENT and NL in tokenize.py (#1607) https://github.com/python/cpython/commit/c471ca448cf336d7eb4a7cbe14d0012baf122d1f
History
Date User Action Args
2022-04-11 14:58:46 admin set github: 74562
2017-05-24 11:54:15 serhiy.storchaka set status: open -> closedresolution: fixedstage: patch review -> resolved
2017-05-24 11:31:59 serhiy.storchaka set messages: +
2017-05-19 20:17:53 terry.reedy set nosy: + terry.reedymessages: + title: Unnecessary complexity in tokenize.py around handling of comments and newlines -> Unnecessary complexity in tokenize.py: comments and newlines
2017-05-19 08:14:19 Albert-Jan Nijburg set messages: +
2017-05-17 06:35:25 serhiy.storchaka set messages: +
2017-05-17 06:23:52 Albert-Jan Nijburg set messages: +
2017-05-17 06:04:20 serhiy.storchaka set messages: +
2017-05-16 16:27:06 Albert-Jan Nijburg set messages: +
2017-05-16 16:08:31 serhiy.storchaka set assignee: serhiy.storchakastage: patch review
2017-05-16 16:08:10 serhiy.storchaka set nosy: + serhiy.storchakamessages: + versions: - Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6
2017-05-16 13:21:15 python-dev set pull_requests: + <pull%5Frequest1697>
2017-05-16 13:08:09 Albert-Jan Nijburg create