[Python-Dev] I'm not getting email from SF when assigned abug/patch

Brett Cannon brett at python.org
Mon Apr 3 01:54:51 CEST 2006


On 4/2/06, Fredrik Lundh <fredrik at pythonware.com> wrote:
> > > Fredrik, if you would like to help move this all forward, great; I
> > > would appreciate the help.  You can write a page scraper to get the
> > > data out of SF
> >
> > challenge accepted ;-)
> >

Woohoo!

> > http://effbot.python-hosting.com/browser/stuff/sandbox/sourceforge/
> >
> > contains three basic tools; getindex to grab index information from a
> > python tracker, getpages to get "raw" xhtml versions of the item pages,
> > and getfiles to get attached files.
> >
> > I'm currently downloading a tracker snapshot that could be useful for
> > testing; it'll take a few more hours before all data are downloaded
> > (provided that SF doesn't ban me, and I don't stumble upon more
> > cases where a certain rhettinger has pasted binary gunk into an
> > iso-8859-1 form ;-).
>
> alright, it took my poor computer nearly eight hours to grab all the
> data, and some tracker items needed special treatment to work around
> some interesting SF bugs, but I've finally managed to download *all*
> items available via the SF tracker index, and *all* data files available
> via the item pages:
>
>     tracker-105470 (bugs)
>         6682 items
>         6682 pages (100%)
>         1912 files
>     tracker-305470 (patches)
>         3610 items
>         3610 pages (100%)
>         4663 files
>     tracker-355470 (feature requests)
>         430 items
>         430 pages (100%)
>         80 files
>
> the complete data set is about 300 megabytes uncompressed, and ~85
> megabytes zipped.
>
> the scripts are designed to make it easy to update the dataset; adding
> new items and files only takes a couple of minutes; refreshing the item
> information may take a few hours.
>
> :::
>
> I've also added a basic "extract" module which parses the XHTML
> pages and the data files.  this module can be used by import scripts,
> or be used to convert the dataset into other formats (e.g. a single
> XML file) for further processing.
>
> the source code is available via the above link; I'll post the ZIP file some-
> where tomorrow (drop me a line if you want the URL).
>

Wonderful, Fredrik!  Thank you for  doing this!    When the data is
available I will arrange to get it put on python.org somewhere and
then start drafting the tracker announcement with where the data is
and how to get at it.

-Brett


More information about the Python-Dev mailing list