Issue 32936: RobotFileParser.parse() should raise an exception when the robots.txt file is invalid (original) (raw)

Issue32936

Created on 2018-02-24 10:53 by Guinness, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (3)
msg312711 - (view) Author: Oudin (Guinness) Date: 2018-02-24 10:53
When processing an ill-formed robots.txt file (like https://tiny.tobast.fr/robots-file ), the RobotFileParser.parse method does not instantiate the entries or the default_entry attributes. In my opinion, the method should raise an exception when no valid User-agent entry (or if there exists an invalid User-agent entry) is found in the robots.txt file. Otherwise, the only method available is to check the None-liness of default_entry, which is not documented in the documentation (https://docs.python.org/dev/library/urllib.robotparser.html). According to your opinion on this, I can implement what is necessary and create a PR on Github.
msg406722 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-11-21 15:14
The link to the robots.txt file no longer works, so it's not clear how to reproduce the problem you are seeing. Can you post the complete information on this issue?
msg407072 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-11-26 16:53
Please reopen this or create a new issue if this is still a problem and you can provide the missing information.
History
Date User Action Args
2022-04-11 14:58:58 admin set github: 77117
2021-11-26 16:53:03 iritkatriel set status: pending -> closedresolution: rejectedmessages: + stage: resolved
2021-11-21 15:14:16 iritkatriel set status: open -> pendingnosy: + iritkatrielmessages: +
2018-02-24 10:53:13 Guinness create