[Python-Dev] I'm not getting email from SF when assigned abug/patch (original) (raw)

Brett Cannon brett at python.org
Mon Apr 3 01:54:51 CEST 2006


On 4/2/06, Fredrik Lundh <fredrik at pythonware.com> wrote:

> > Fredrik, if you would like to help move this all forward, great; I > > would appreciate the help. You can write a page scraper to get the > > data out of SF > > challenge accepted ;-) >

Woohoo!

> http://effbot.python-hosting.com/browser/stuff/sandbox/sourceforge/ > > contains three basic tools; getindex to grab index information from a > python tracker, getpages to get "raw" xhtml versions of the item pages, > and getfiles to get attached files. > > I'm currently downloading a tracker snapshot that could be useful for > testing; it'll take a few more hours before all data are downloaded > (provided that SF doesn't ban me, and I don't stumble upon more > cases where a certain rhettinger has pasted binary gunk into an > iso-8859-1 form ;-).

alright, it took my poor computer nearly eight hours to grab all the data, and some tracker items needed special treatment to work around some interesting SF bugs, but I've finally managed to download all items available via the SF tracker index, and all data files available via the item pages: tracker-105470 (bugs) 6682 items 6682 pages (100%) 1912 files tracker-305470 (patches) 3610 items 3610 pages (100%) 4663 files tracker-355470 (feature requests) 430 items 430 pages (100%) 80 files the complete data set is about 300 megabytes uncompressed, and ~85 megabytes zipped. the scripts are designed to make it easy to update the dataset; adding new items and files only takes a couple of minutes; refreshing the item information may take a few hours. ::: I've also added a basic "extract" module which parses the XHTML pages and the data files. this module can be used by import scripts, or be used to convert the dataset into other formats (e.g. a single XML file) for further processing. the source code is available via the above link; I'll post the ZIP file some- where tomorrow (drop me a line if you want the URL).

Wonderful, Fredrik! Thank you for doing this! When the data is available I will arrange to get it put on python.org somewhere and then start drafting the tracker announcement with where the data is and how to get at it.

-Brett



More information about the Python-Dev mailing list