Message 171711 - Python tracker (original) (raw)

Robotparser doesn't support two quite important optional parameters from the robots.txt file. I have implemented those in the following way: (Robotparser should be initialized in the usual way: rp = robotparser.RobotFileParser() rp.set_url(..) rp.read )

crawl_delay(useragent) - Returns time in seconds that you need to wait for crawling if none is specified, or doesn't apply to this user agent, returns -1 request_rate(useragent) - Returns a list in the form [request,seconds]. if none is specified, or doesn't apply to this user agent, returns -1