Issue 13924: Mercurial robots.txt should let robots crawl landing pages. (original) (raw)

Created on 2012-02-01 22:29 by Ivaylo.Popov, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (6)
msg152446 - (view) Author: Ivaylo Popov (Ivaylo.Popov) Date: 2012-02-01 22:29
http://hg.python.org/robots.txt currently disallows all robots from all paths. This means that the site doesn't show up in Google search results seeking, for instance, browsing access to the python source https://www.google.com/search?ie=UTF-8&q=python+source+browse https://www.google.com/search?ie=UTF-8&q=python+repo+browse https://www.google.com/search?ie=UTF-8&q=hg+python+browse etc... Instead, robots.txt should allow access to the landing page, http://hg.python.org/, and the landing pages for hosted projects, e.g. http://hg.python.org/cpython/, while prohibiting access to the */rev/*, */shortlog/*, ..., directories. This change would be very easy, cost virtually nothing, and let users find the mercurial repository viewer from search engines. Note that http://svn.python.org/ does show up in search results, as an illustration of how convenient this is.
msg152457 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-02 13:26
Can you propose a robots.txt file?
msg219976 - (view) Author: Emily Zhao (emily.zhao) * Date: 2014-06-07 21:12
I don't know too much about robots.txt but how about Disallow: */rev/* Disallow: */shortlog/* Allow: Are there any other directories we'd like to exclude?
msg220003 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2014-06-07 23:54
Unfortunately, I don't think it will be that easy because I don't think robots.txt supports wildcard paths like that. Possibly, we should just whitelist a few important repositories.
msg220109 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-06-09 19:06
Yes, I think we should whitelist rather than blacklist. The problem with letting engines index the repositories is the sheer resource cost when they fetch many heavy pages (such as annotate, etc.).
msg275898 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2016-09-12 00:17
Two things: is it worth fixing this bug given the impending move to github? Also, why is this reported here and not the pydotorg tracker? https://github.com/python/pythondotorg/issues Given that the last comment was 2014, I'm going to go ahead and close this issue.
History
Date User Action Args
2022-04-11 14:57:26 admin set github: 58132
2016-09-12 00:17:26 barry set status: open -> closednosy: + barrymessages: + resolution: wont fix
2014-06-09 19:06:07 pitrou set messages: +
2014-06-07 23:54:14 benjamin.peterson set nosy: + benjamin.petersonmessages: +
2014-06-07 21:12:46 emily.zhao set nosy: + emily.zhaomessages: +
2013-08-17 14:53:04 ezio.melotti set keywords: + easystage: needs patch
2012-02-02 14:42:52 ezio.melotti set nosy: + ezio.melotti
2012-02-02 13:26:24 pitrou set nosy: + georg.brandl, pitroumessages: +
2012-02-01 22:29:55 Ivaylo.Popov create