[Python-ideas] A service to crawl +1s and URLs out of mailman archives

Steven D'Aprano steve at pearwood.info
Mon Dec 1 19:51:58 CET 2014


On Mon, Dec 01, 2014 at 09:52:41AM -0600, Wes Turner wrote:

> In context to building a PEP or similar, I don't know how many times I've
> trawled looking for:
> 
> * Docs links
> * Source links
> * Patch links
> * THREAD POST LINKS
> * Consensus
> 
> A tool to crawl structued and natural language data from the forums could
> be very useful for preparing PEPs.

Yes it would be. Do you have any idea how to write such a tool?

Do you think suh a tool would be of enough interest to enough people 
that it should be distributed in the Python standard library?

I think that this would make a great project on PyPI, especially since 
it make take a long, long time for it to develop enough intelligence to 
be able to do the job you're suggesting. Finding links to documentation 
and source code is fairly straightforward, but building in the 
intelligence to find "consensus" is a non-trivial application of natural 
language processing and an impressive feat of artificial intelligence. 
It certainly doesn't sound like something that somebody could write over 
a weekend and add to the 3.5 standard library, it's more like an 
on-going project that will see continual development for many years.



-- 
Steven


More information about the Python-ideas mailing list