I don't know too much about robots.txt but how about Disallow: */rev/* Disallow: */shortlog/* Allow: Are there any other directories we'd like to exclude?
Unfortunately, I don't think it will be that easy because I don't think robots.txt supports wildcard paths like that. Possibly, we should just whitelist a few important repositories.
Yes, I think we should whitelist rather than blacklist. The problem with letting engines index the repositories is the sheer resource cost when they fetch many heavy pages (such as annotate, etc.).
Two things: is it worth fixing this bug given the impending move to github? Also, why is this reported here and not the pydotorg tracker? https://github.com/python/pythondotorg/issues Given that the last comment was 2014, I'm going to go ahead and close this issue.
History
Date
User
Action
Args
2022-04-11 14:57:26
admin
set
github: 58132
2016-09-12 00:17:26
barry
set
status: open -> closednosy: + barrymessages: + resolution: wont fix