[Python-Dev] I'm not getting email from SF when assigned abug/patch (original) (raw)
Fredrik Lundh fredrik at pythonware.com
Mon Apr 3 00:28:29 CEST 2006
- Previous message: [Python-Dev] I'm not getting email from SF when assigneda bug/patch
- Next message: [Python-Dev] I'm not getting email from SF when assigned abug/patch
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> Fredrik, if you would like to help move this all forward, great; I > would appreciate the help. You can write a page scraper to get the > data out of SF
challenge accepted ;-) http://effbot.python-hosting.com/browser/stuff/sandbox/sourceforge/ contains three basic tools; getindex to grab index information from a python tracker, getpages to get "raw" xhtml versions of the item pages, and getfiles to get attached files. I'm currently downloading a tracker snapshot that could be useful for testing; it'll take a few more hours before all data are downloaded (provided that SF doesn't ban me, and I don't stumble upon more cases where a certain rhettinger has pasted binary gunk into an iso-8859-1 form ;-).
alright, it took my poor computer nearly eight hours to grab all the data, and some tracker items needed special treatment to work around some interesting SF bugs, but I've finally managed to download all items available via the SF tracker index, and all data files available via the item pages:
tracker-105470 (bugs)
6682 items
6682 pages (100%)
1912 files
tracker-305470 (patches)
3610 items
3610 pages (100%)
4663 files
tracker-355470 (feature requests)
430 items
430 pages (100%)
80 files
the complete data set is about 300 megabytes uncompressed, and ~85 megabytes zipped.
the scripts are designed to make it easy to update the dataset; adding new items and files only takes a couple of minutes; refreshing the item information may take a few hours.
::: I've also added a basic "extract" module which parses the XHTML pages and the data files. this module can be used by import scripts, or be used to convert the dataset into other formats (e.g. a single XML file) for further processing.
the source code is available via the above link; I'll post the ZIP file some- where tomorrow (drop me a line if you want the URL).
- Previous message: [Python-Dev] I'm not getting email from SF when assigneda bug/patch
- Next message: [Python-Dev] I'm not getting email from SF when assigned abug/patch
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]