bpo-33899: Make tokenize module mirror end-of-file is end-of-line behavior by ammaraskar · Pull Request #7891 · python/cpython (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation27 Commits8 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

Most of the change involves fixing up the test suite, which previously made the assumption that there wouldn't be a new line if the input didn't end in one.

https://bugs.python.org/issue33899

while True: # loop over lines in stream
try:
last_line = line

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't last_line only be set after StopIteration is caught? ISTM that in other cases we wouldn't want to be adding the newline at the end.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readline is one of the ancient APIs that existed before generators. There's two ways of stopping iteration, either raising StopIteration or returning the empty string, the latter gets caught all the way down here https://github.com/python/cpython/blob/master/Lib/tokenize.py#L528

How you're describing it is what I had initially but there's a few places in the loop where iteration can stop so I thought this would be simpler.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then perhaps add a comment to that end? It seems rather crucial to understanding that particular piece of the code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added.

while True: # loop over lines in stream
try:
# This loop has multiple points where it can break out so we
# pick up the value for the last_line here, at one unified
# point to keep things simple.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"at one unified point to keep things simple" should be removed IMO (unnecessary).

I suggest mentioning briefly why last_line is kept (to the best of my understanding: the last line is needed for the newline check after the loop, but line will be overridden at the last loop iteration).

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

You're right, on a second read the comment didn't really explain much. Reworded.

I have made the requested changes; please review again

Thanks for making the requested changes!

@taleinat: please review the changes made to this pull request.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a whitespace issue in one of the files. For details, run make patchcheck or take a look at the output on Travis.

Whoops, that's what I get for being lazy and using the github web editor.

I have made the requested changes; please review again

Thanks for making the requested changes!

@taleinat: please review the changes made to this pull request.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the whole PR again and a have a few final remarks. After these are fixed it would be good to go IMO.

@@ -648,6 +655,9 @@ def _tokenize(readline, encoding):
(lnum, pos), (lnum, pos+1), line)
pos += 1

# Add an implicit NEWLINE if the input doesn't end in one
if len(last_line) > 0 and last_line[-1] not in '\r\n':

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should begin if last_line and ....

for type, token, start, end, line in tokenize(f.readline):
if type == ENDMARKER:
break
if s[-1] not in '\r\n' and type == NEWLINE and end[0] == num_lines:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is quite unreadable. Not simple to make more readable though. A short comment could help.

for type, token, start, end, line in generate_tokens(f.readline):
if type == ENDMARKER:
break
if s[-1] not in '\r\n' and type == NEWLINE and end[0] == num_lines:
continue

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check_tokenize() logic is repeated exactly, but it is now becoming rather involved. This should be a common function used by both classes.

Thanks for the reviews Tal, I really appreciate it! I refactored out the common logic, does this seem fine? I also added a comment and added a variable in the check to convey the meaning better.

I have made the requested changes; please review again

Thanks @ammaraskar for the PR, and @taleinat for merging it 🌮🎉.. I'm working now to backport this PR to: 2.7, 3.6, 3.7.
🐍🍒⛏🤖

Sorry, @ammaraskar and @taleinat, I could not cleanly backport this to 3.7 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker c4ef4896eac86a6759901c8546e26de4695a1389 3.7

Sorry, @ammaraskar and @taleinat, I could not cleanly backport this to 2.7 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker c4ef4896eac86a6759901c8546e26de4695a1389 2.7

Sorry, @ammaraskar and @taleinat, I could not cleanly backport this to 3.6 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker c4ef4896eac86a6759901c8546e26de4695a1389 3.6

Will backport by hand in a bit

@ammaraskar, ask me (via GitHub's "Reviewers" feature) to review the backport PRs when they're ready.

ammaraskar added a commit to ammaraskar/cpython that referenced this pull request

Jul 6, 2018

…ne behavior (pythonGH-7891)

Most of the change involves fixing up the test suite, which previously made the assumption that there wouldn't be a new line if the input didn't end in one.

Contributed by Ammar Askar.. (cherry picked from commit c4ef489)

Co-authored-by: Ammar Askar ammar_askar@hotmail.com

ammaraskar added a commit to ammaraskar/cpython that referenced this pull request

Jul 6, 2018

…ne behavior (pythonGH-7891)

Most of the change involves fixing up the test suite, which previously made the assumption that there wouldn't be a new line if the input didn't end in one.

Contributed by Ammar Askar.. (cherry picked from commit c4ef489)

Co-authored-by: Ammar Askar ammar_askar@hotmail.com

ammaraskar added a commit to ammaraskar/cpython that referenced this pull request

Jul 6, 2018

…ne behavior (pythonGH-7891)

Most of the change involves fixing up the test suite, which previously made the assumption that there wouldn't be a new line if the input didn't end in one.

Contributed by Ammar Askar.. (cherry picked from commit c4ef489)

Co-authored-by: Ammar Askar ammar_askar@hotmail.com

@taleinat
It seems like I can't touch the reviewers/labels etc, only people with access to the repo can.

2.7: #8133
3.6: #8134
3.7: #8132

taleinat pushed a commit that referenced this pull request

Jul 6, 2018

…ne behavior (GH-7891) (GH-8132)

Most of the change involves fixing up the test suite, which previously made the assumption that there wouldn't be a new line if the input didn't end in one.

Contributed by Ammar Askar.

(cherry picked from commit c4ef489)

taleinat pushed a commit that referenced this pull request

Jul 6, 2018

…ne behavior (GH-7891) (GH-8134)

Most of the change involves fixing up the test suite, which previously made the assumption that there wouldn't be a new line if the input didn't end in one.

Contributed by Ammar Askar.

(cherry picked from commit c4ef489)

taleinat pushed a commit that referenced this pull request

Jul 6, 2018

…ne behavior (GH-7891) (#8133)

Most of the change involves fixing up the test suite, which previously made the assumption that there wouldn't be a new line if the input didn't end in one.

Contributed by Ammar Askar.

(cherry picked from commit c4ef489)