[Python-ideas] A service to crawl +1s and URLs out of mailman archives

Wes Turner wes.turner at gmail.com
Thu Dec 18 17:52:46 CET 2014


cc'd here from
https://westurner.github.io/wiki/ideas#open-source-mailing-list-extractor


Open Source Mailing List Extractor
----------------------------------
Use cases:

* https://mail.python.org/pipermail/python-ideas/2014-December/030228.html
* incentivization of actionable crossreferences
* PEP research
* "is this actionable?"
* "are we voting?"


* Crawl/parse/extract links and +1 from given thread(s)

  * Detect a few standard link types:

    * Issue
    * Src
    * Doc
    * Ref
    * x-link

  * +1s with expandable snippets? (like ``grep -C``)

* There could be configurable per-list link heuristics:

  * http[s]
  * Issue: https://bugs.python.org/issue(\d+)
  * Src: https://hg.python.org/<repo>/<path>
  * Src: https://github.com/<org>/<project>/<path>
  * Src: https://bitbucket.org/<org>/<project>/<path>
  * Patch/Attachment: http[s]://bugs.python.org/(file[\d]+)/
<filename(.diff)>
  * Doc: https://docs.python.org/<ver>/<path>
  * Wiki: https://wiki.python.org/moin/<path>
  * Homepage: https://www.python.org/<path>
  * PyPI pkg: https://pypi.python.org/pypi/<path>
  * Warehouse pkg: https://warehouse.python.org/project/<path>
  * Wikipedia: https://[lang].wikipedia.org/wiki/<page> --> (dbpedia:<page>)
  * Build:
http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%203.4/builds/771
  * ... JSON-LD RDF


* This could - most efficiently - be added to mailman
  (e.g. in Postorious and/or HyperKitty)

  * http://mailman-bundler.readthedocs.org/en/latest/
  * http://pythonhosted.org//mailman/
  * https://mail.python.org/mailman/listinfo/mailman-developers

... looking forward Mailman3.

On Thu, Dec 18, 2014 at 10:47 AM, Wes Turner <wes.turner at gmail.com> wrote:
>
> >>Is there any way to recognize a Lightweight Markup Language doctype
> >>declaration, in email?
>
> > Maybe a Content-Type?  But can you elaborate one what you're thinking
> about?
>
> ```restructuredtext
> Often, mailing list and issue text gets *worked into* docs.
> ```
>
> ReST in email may ultimately be a bit noisy and ultimately an unproductive
> feature.
>
>
> On Tue, Dec 2, 2014 at 1:32 PM, Barry Warsaw <barry at python.org> wrote:
>
>> On Dec 02, 2014, at 04:53 AM, Wes Turner wrote:
>>
>> >Are the Python mailman instances upgraded to Mailman 3, with the Django
>> GUI?
>>
>> No, but we know there are people using it in production.  I expect we'll
>> get
>> another beta before the end of the year and then I think it's time to
>> start
>> planning an experimental deployment on pdo.  We have a few experimental
>> lists
>> on mpo that we can convert and start playing with.
>>
>> >Is there any way to recognize a Lightweight Markup Language doctype
>> >declaration, in email?
>>
>> Maybe a Content-Type?  But can you elaborate one what you're thinking
>> about?
>>
>> Cheers,
>> -Barry
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20141218/576813c7/attachment.html>


More information about the Python-ideas mailing list