Issue 32936: RobotFileParser.parse() should raise an exception when the robots.txt file is invalid (original) (raw)
Issue32936
Created on 2018-02-24 10:53 by Guinness, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Messages (3) | ||
---|---|---|
msg312711 - (view) | Author: Oudin (Guinness) | Date: 2018-02-24 10:53 |
When processing an ill-formed robots.txt file (like https://tiny.tobast.fr/robots-file ), the RobotFileParser.parse method does not instantiate the entries or the default_entry attributes. In my opinion, the method should raise an exception when no valid User-agent entry (or if there exists an invalid User-agent entry) is found in the robots.txt file. Otherwise, the only method available is to check the None-liness of default_entry, which is not documented in the documentation (https://docs.python.org/dev/library/urllib.robotparser.html). According to your opinion on this, I can implement what is necessary and create a PR on Github. | ||
msg406722 - (view) | Author: Irit Katriel (iritkatriel) * ![]() |
Date: 2021-11-21 15:14 |
The link to the robots.txt file no longer works, so it's not clear how to reproduce the problem you are seeing. Can you post the complete information on this issue? | ||
msg407072 - (view) | Author: Irit Katriel (iritkatriel) * ![]() |
Date: 2021-11-26 16:53 |
Please reopen this or create a new issue if this is still a problem and you can provide the missing information. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:58 | admin | set | github: 77117 |
2021-11-26 16:53:03 | iritkatriel | set | status: pending -> closedresolution: rejectedmessages: + stage: resolved |
2021-11-21 15:14:16 | iritkatriel | set | status: open -> pendingnosy: + iritkatrielmessages: + |
2018-02-24 10:53:13 | Guinness | create |