Issue 33899: Tokenize module does not mirror "end-of-input" is newline behavior (original) (raw)

process

Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ammar2 Nosy List: Anthony Sottile, ammar2, asmeurer, benjamin.peterson, brechtm, gregory.p.smith, meador.inge, miss-islington, ned.deily, taleinat, terry.reedy
Priority: normal Keywords: patch

Created on 2018-06-19 07:41 by ammar2, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 7891 merged ammar2,2018-06-24 12:34
PR 8132 merged ammar2,2018-07-06 07:58
PR 8133 merged ammar2,2018-07-06 08:08
PR 8134 merged ammar2,2018-07-06 08:20
PR 10072 merged taleinat,2018-10-24 06:05
PR 10073 merged taleinat,2018-10-24 06:56
PR 10074 merged taleinat,2018-10-24 07:24
PR 10075 merged taleinat,2018-10-24 07:27
Messages (27)
msg319934 - (view) Author: Ammar Askar (ammar2) * (Python committer) Date: 2018-06-19 07:41
As was pointed out in https://bugs.python.org/issue33766 there is an edge case in the tokenizer whereby it will implicitly treat the end of input as a newline. The tokenize module in stdlib does not mirror the C code's behavior in this case. tokenizer.c: ~/cpython $ echo -n 'x' | ./python ---------- NAME ("x") NEWLINE ENDMARKER tokenize module: ~/cpython $ echo -n 'x' ./python -m tokenize 1,0-1,1: NAME 'x' 2,0-2,0: ENDMARKER '' The instrumentation to have the C tokenizer dump out its tokens is mine, can provide a diff to produce that output if needed.
msg321154 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-06 07:19
New changeset c4ef4896eac86a6759901c8546e26de4695a1389 by Tal Einat (Ammar Askar) in branch 'master': bpo-33899: Make tokenize module mirror end-of-file is end-of-line behavior (GH-7891) https://github.com/python/cpython/commit/c4ef4896eac86a6759901c8546e26de4695a1389
msg321162 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-06 10:21
New changeset ab75d9e4244ee24bc96ea9d52362899e3bf365a2 by Tal Einat (Ammar Askar) in branch '3.7': [3.7] bpo-33899: Make tokenize module mirror end-of-file is end-of-line behavior (GH-7891) (GH-8132) https://github.com/python/cpython/commit/ab75d9e4244ee24bc96ea9d52362899e3bf365a2
msg321163 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-06 10:22
New changeset 11c36a3e16f7fd4e937466014e8393ede4b61a25 by Tal Einat (Ammar Askar) in branch '3.6': [3.6] bpo-33899: Make tokenize module mirror end-of-file is end-of-line behavior (GH-7891) (GH-8134) https://github.com/python/cpython/commit/11c36a3e16f7fd4e937466014e8393ede4b61a25
msg321164 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-06 10:23
New changeset 7829bba45d0e2446f3a0ca240bfe46959f01071e by Tal Einat (Ammar Askar) in branch '2.7': [2.7] bpo-33899: Make tokenize module mirror end-of-file is end-of-line behavior (GH-7891) (#8133) https://github.com/python/cpython/commit/7829bba45d0e2446f3a0ca240bfe46959f01071e
msg321165 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-07-06 10:24
Thanks for all of your work on this, Ammar!
msg328220 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2018-10-21 16:10
This change in behaviour is breaking pycodestyle: https://github.com/PyCQA/pycodestyle/issues/786 Perhaps it shouldn't have been backported (especially all the way to python2.7?)
msg328222 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-10-21 18:05
This was backported since it was considered a bug, but you are right that it broke backwards compatibility, and perhaps shouldn't have been backported. Still, with 3.6.6 and 3.7.1 now released, that ship has sailed. We could perhaps revert this on the 2.7 branch, but I feel that reverting this change only on 2.7 would just cause even more confusion.
msg328226 - (view) Author: Anthony Sottile (Anthony Sottile) * Date: 2018-10-21 23:42
I'm surprised this was classified as a bug! Though that's subjective so I get that it's difficult to decide what is and what isn't ¯\____(ツ)____/¯
msg328227 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-10-22 00:45
Apparently this change also affected IPython. Perhaps we should add an entry to the whatsnew documents for 3.7.1 and 3.7.6: https://docs.python.org/3/whatsnew/3.7.html#notable-changes-in-python-3-7-1 https://docs.python.org/3.6/whatsnew/3.6.html#notable-changes-in-python-3-6-7
msg328238 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-10-22 06:34
I'm sorry to have caused this mess, it was bad judgement on my part. Adding mention in What's is a good idea, Ned, I'll do that.
msg328283 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-10-23 06:42
Ned, should this also be added to the 2.7 What's New? Or perhaps reverted on the 2.7 branch?
msg328318 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-10-23 14:14
I don't have a strong opinion about 2.7 here. Ultimately, it's Benjamin's call. But it might make sense to revert for 2.7 since it hasn't been released yet.
msg328324 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2018-10-23 15:59
Please revert in 2.7.
msg328353 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-10-24 06:33
See PR GH-10072 for reverting in 2.7.
msg328354 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2018-10-24 06:50
FYI, An example of other fallout from this change - patsy broke and needed this fix: https://github.com/pydata/patsy/commit/4f53bbaf58c0bf1a9bed73fc67c7c6d0aa7f4e20#diff-53c70e68c6dfd4fe9b08427792cb2bd6
msg328355 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-10-24 06:57
See PR GH-10073 adding mention in "What's New".
msg328356 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2018-10-24 07:17
some pylint fallout appears to be addressed in https://github.com/PyCQA/pylint/commit/2698cbe56b44df7974de1c3374db8700296c6fad
msg328357 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2018-10-24 07:20
New changeset dfba1f67e7f1381ceb7cec8fbcfa37337620a9b0 by Gregory P. Smith (Tal Einat) in branch 'master': bpo-33899: Mention tokenize behavior change in What's New (GH-10073) https://github.com/python/cpython/commit/dfba1f67e7f1381ceb7cec8fbcfa37337620a9b0
msg328358 - (view) Author: miss-islington (miss-islington) Date: 2018-10-24 07:32
New changeset 9a0476283393f9988d0946491052d7724a7f9d21 by Miss Islington (bot) (Tal Einat) in branch '3.6': [3.6] bpo-33899: Mention tokenize behavior change in What's New (GH-10073) (GH-10075) https://github.com/python/cpython/commit/9a0476283393f9988d0946491052d7724a7f9d21
msg328359 - (view) Author: miss-islington (miss-islington) Date: 2018-10-24 07:33
New changeset b4c9874f5c7f64e1d41cbc588e515b8851bbb90c by Miss Islington (bot) (Tal Einat) in branch '3.7': [3.7] bpo-33899: Mention tokenize behavior change in What's New (GH-10073) (GH-10074) https://github.com/python/cpython/commit/b4c9874f5c7f64e1d41cbc588e515b8851bbb90c
msg328360 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-10-24 07:40
Thanks for helping with the fallout from this, Gregory.
msg328369 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-10-24 14:53
#33766 was about documenting the C tokenizer change, some years ago, that made end-of-file EOF and end-of-string EOS generate the NEWLINE token required to properly terminate statements. "The end of input also serves as an implicit terminator for the final physical line." Although the tokenizer module intentionally does not exactly mirror the C tokenizer (it adds COMMENT tokens), it plausibly seems like a bug that it was not changed along with the C tokenizer, as it has since been tokenizing valid code as grammatically invalid. But I agree that this fix is too disruptive for 2.7.
msg328383 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2018-10-24 17:32
New changeset a1f45ec73f0486b187633e7ebc0a4f559d29d7d9 by Benjamin Peterson (Tal Einat) in branch '2.7': bpo-33899: Revert tokenize module adding an implicit final NEWLINE (GH-10072) https://github.com/python/cpython/commit/a1f45ec73f0486b187633e7ebc0a4f559d29d7d9
msg328877 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2018-10-29 22:27
https://bugs.python.org/issue35107 filed to track further fallout from this API change.
msg330213 - (view) Author: Aaron Meurer (asmeurer) Date: 2018-11-21 19:21
Is it expected behavior that comments produce NEWLINE if they don't have a newline and don't produce NEWLINE if they do (that is, '# comment' produces NEWLINE but '# comment\n' does not)?
msg338601 - (view) Author: Brecht Machiels (brechtm) Date: 2019-03-22 11:46
In order to adapt code to this change, can we assume that a NEWLINE token with an empty string only occurs right before the ENDMARKER?
History
Date User Action Args
2022-04-11 14:59:01 admin set github: 78080
2019-03-22 11:46:47 brechtm set nosy: + brechtmmessages: +
2018-11-21 19:21:45 asmeurer set nosy: + asmeurermessages: +
2018-10-29 22:27:02 gregory.p.smith set messages: +
2018-10-26 11:09:03 taleinat set versions: - Python 2.7
2018-10-25 15:57:03 terry.reedy set pull_requests: - <pull%5Frequest9424>
2018-10-25 13:35:51 Tim.Graham set pull_requests: + <pull%5Frequest9424>
2018-10-25 01:00:21 ned.deily set pull_requests: - <pull%5Frequest9411>
2018-10-25 00:59:53 ned.deily set pull_requests: - <pull%5Frequest9415>
2018-10-24 21:40:28 Tim.Graham set pull_requests: + <pull%5Frequest9415>
2018-10-24 17:32:27 benjamin.peterson set messages: +
2018-10-24 14:53:02 terry.reedy set nosy: + terry.reedymessages: +
2018-10-24 14:39:02 corona10 set pull_requests: + <pull%5Frequest9411>
2018-10-24 07:40:28 taleinat set messages: +
2018-10-24 07:33:04 miss-islington set messages: +
2018-10-24 07:32:42 miss-islington set nosy: + miss-islingtonmessages: +
2018-10-24 07:27:36 taleinat set pull_requests: + <pull%5Frequest9409>
2018-10-24 07:24:56 taleinat set pull_requests: + <pull%5Frequest9408>
2018-10-24 07:20:14 gregory.p.smith set messages: +
2018-10-24 07:17:51 gregory.p.smith set messages: +
2018-10-24 06:57:35 taleinat set messages: +
2018-10-24 06:56:59 taleinat set pull_requests: + <pull%5Frequest9407>
2018-10-24 06:50:53 gregory.p.smith set nosy: + gregory.p.smithmessages: +
2018-10-24 06:33:29 taleinat set messages: +
2018-10-24 06:05:04 taleinat set pull_requests: + <pull%5Frequest9406>
2018-10-23 15:59:30 benjamin.peterson set messages: +
2018-10-23 14:14:58 ned.deily set nosy: + benjamin.petersonmessages: +
2018-10-23 06:42:04 taleinat set messages: +
2018-10-22 06:34:52 taleinat set messages: +
2018-10-22 00:45:22 ned.deily set nosy: + ned.deilymessages: +
2018-10-21 23:42:03 Anthony Sottile set messages: +
2018-10-21 18:05:38 taleinat set messages: +
2018-10-21 16:10:26 Anthony Sottile set nosy: + Anthony Sottilemessages: +
2018-07-06 10:24:50 taleinat set status: open -> closedversions: + Python 2.7, Python 3.6, Python 3.7messages: + resolution: fixedstage: patch review -> resolved
2018-07-06 10:23:15 taleinat set messages: +
2018-07-06 10:22:28 taleinat set messages: +
2018-07-06 10:21:08 taleinat set messages: +
2018-07-06 08:20:34 ammar2 set pull_requests: + <pull%5Frequest7712>
2018-07-06 08:08:06 ammar2 set pull_requests: + <pull%5Frequest7711>
2018-07-06 07:58:00 ammar2 set pull_requests: + <pull%5Frequest7710>
2018-07-06 07:19:11 taleinat set nosy: + taleinatmessages: +
2018-06-24 12:34:12 ammar2 set keywords: + patchstage: patch reviewpull_requests: + <pull%5Frequest7501>
2018-06-22 19:23:47 ned.deily set nosy: + meador.inge
2018-06-19 07:41:52 ammar2 create