bpo-32861: urllib.robotparser fix incomplete str methods. by michael-lazar · Pull Request #5711 · python/cpython (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation12 Commits3 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

michael-lazar

The RobotFileParser's string representation was incomplete and missing some valid rule lines.

serhiy-storchaka

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have added just few suggestions. And the two unnecessary trailing newlines should be kept in maintained releases.

Please add a news entry.

@@ -190,7 +190,10 @@ def request_rate(self, useragent):
return self.default_entry.req_rate
def __str__(self):
return ''.join([str(entry) + "\n" for entry in self.entries])
ret = [str(entry) for entry in self.entries]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be faster:

entries = self.entries if self.default_entry is not None: entries = entries + [self.default_entry] return '\n\n'.join(map(str, entries))

@@ -222,10 +225,14 @@ def __init__(self):
def __str__(self):
ret = []
for agent in self.useragents:
ret.extend(["User-agent: ", agent, "\n"])
ret.append("User-agent: {0}".format(agent))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f-strings can be used in 3.6+.

for line in self.rulelines:
ret.extend([str(line), "\n"])
return ''.join(ret)
ret.append(str(line))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just

ret.extend(map(str, self.rulelines))

@michael-lazar

All suggestions have been implemented, I have no strong opinions on any of them. I also added a news entry.

Is backporting part of this PR, or do you merge this into 3.8 and then create separate issues for the other python branches? Is that something that I can help with?

serhiy-storchaka

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Just add your credits.

@@ -0,0 +1,3 @@
The urllib.robotparser's ``__str__`` representation now includes wildcard
entries and the "Crawl-delay" and "Request-rate" fields. Also removes extra
newlines that were being appended to the end of the string.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add "Patch by yourname." and add your name into Misc/ACKS.

@michael-lazar

@michael-lazar

Cool, I added my name to the news entry. I'm already in the ACKS file so all good there.

@miss-islington

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request

May 14, 2018

@michael-lazar @miss-islington

…GH-5711)

The urllib.robotparser's str representation now includes wildcard entries and the "Crawl-delay" and "Request-rate" fields. Also removes extra newlines that were being appended to the end of the string. (cherry picked from commit bd08a0a)

Co-authored-by: Michael Lazar lazar.michael22@gmail.com

@bedevere-bot

@miss-islington

@bedevere-bot

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request

May 14, 2018

@michael-lazar @miss-islington

…GH-5711)

The urllib.robotparser's str representation now includes wildcard entries and the "Crawl-delay" and "Request-rate" fields. Also removes extra newlines that were being appended to the end of the string. (cherry picked from commit bd08a0a)

Co-authored-by: Michael Lazar lazar.michael22@gmail.com

serhiy-storchaka pushed a commit that referenced this pull request

May 14, 2018

…H-5711) (GH-6795)

The urllib.robotparser's str representation now includes wildcard entries and the "Crawl-delay" and "Request-rate" fields. (cherry picked from commit bd08a0a)

Co-authored-by: Michael Lazar lazar.michael22@gmail.com

miss-islington added a commit to miss-islington/cpython that referenced this pull request

May 14, 2018

@miss-islington

…ythonGH-5711) (pythonGH-6795)

The urllib.robotparser's str representation now includes wildcard entries and the "Crawl-delay" and "Request-rate" fields. (cherry picked from commit bd08a0a)

Co-authored-by: Michael Lazar lazar.michael22@gmail.com (cherry picked from commit c3fa1f2)

Co-authored-by: Miss Islington (bot) 31488909+miss-islington@users.noreply.github.com

serhiy-storchaka pushed a commit to serhiy-storchaka/cpython that referenced this pull request

May 14, 2018

@miss-islington @serhiy-storchaka

…ythonGH-5711) (pythonGH-6795)

The robotparser's str representation now includes wildcard entries. (cherry picked from commit c3fa1f2)

Co-authored-by: Michael Lazar lazar.michael22@gmail.com.

miss-islington added a commit to miss-islington/cpython that referenced this pull request

May 14, 2018

@miss-islington

…ythonGH-5711) (pythonGH-6795)

The urllib.robotparser's str representation now includes wildcard entries and the "Crawl-delay" and "Request-rate" fields. (cherry picked from commit bd08a0a)

Co-authored-by: Michael Lazar lazar.michael22@gmail.com (cherry picked from commit c3fa1f2)

Co-authored-by: Miss Islington (bot) 31488909+miss-islington@users.noreply.github.com

serhiy-storchaka pushed a commit that referenced this pull request

May 14, 2018

…H-5711) (GH-6795) (GH-6818)

The urllib.robotparser's str representation now includes wildcard entries and the "Crawl-delay" and "Request-rate" fields. (cherry picked from commit c3fa1f2)

Co-authored-by: Michael Lazar lazar.michael22@gmail.com

serhiy-storchaka added a commit that referenced this pull request

May 14, 2018

@serhiy-storchaka

GH-6795) (GH-6817)

The robotparser's str representation now includes wildcard entries. (cherry picked from commit c3fa1f2)

Co-authored-by: Michael Lazar lazar.michael22@gmail.com.

Labels

type-bug

An unexpected behavior, bug, or error