[Python-ideas] stdlib crowdsourcing

anatoly techtonik techtonik at gmail.com
Tue May 29 07:05:27 CEST 2012


The problem with stdlib - it is all damn subjective. There is no
process to add functions and modules if you're not well-behaved and
skilled in public debates and don't have really a lot of time to be a
champion of your module/function. In other words - it is hard (if not
impossible for 80% of Python Earth population). So, many people and
projects decide to opt-out. Take a look at Twisted - a lot of useful
stuff, but not in Python stdlib. So..

Provide a way for people to opt-out from core stuff, but still allow
to share the changes and update code if necessary.

This will require:
- a local stdlib Python path convention
- snippet normalization function and AST hash dumper
- web site with stats
- source code crawler

How it works:
1. Every project maintains its own stdlib directory with functions
that they feel are good to have in standard library
2. Functions are placed so that they are imported as if from standard
library, but this time with stdlib prefix
3. The license for this directory is public domain to remove all legal
barriers (credits are welcome, but optional)
4. Crawler (probably PyPI) scans this stdlib dir, finds functions,
normalizes them, calculates hash and submits to web site
  4.1 Normalization is required to find the shared function
copy/pasted across different projects with different
        indentation level, docstrings, parameters/variable names etc.
  4.2 Hash is calculated upon AST. There are at least three hashes for
each entry:
       4.2.1 Full hash - all docstrings and variable names are
preserved, whitespace normalized
       4.2.2 Stripped hash - docstrings are stripped, variable names
are normalized
       4.2.3 Signature hash - a mark placed in a comment above
function name, either calculated from function
                signature or generated randomly, used for manual
tracking of copy/paste e.g. pd:ac546df6b8340a92
5. Web site maintains usage and popularity staff, accepts votes on
inclusion of snippets


User stories:
1. "I want to find if there is a better/updated version of my function
available"
   1.1  I enter hash into web site search form
   1.2  Site gives me a link to my snippet
   1.3  I can see what people proposed to replace this function with
   1.4  I can choose the function with most votes
   1.5  I can flag the functions I may find irrelevant or
   1.5  I can tag the functions that divert in different direction
than I need to filter them

2. "I want to reuse code snippets without additional dependencies on
3rd party projects"
   1.1  Just place them into my own stdlib directory

3. "I want to update code snippets when there is an update for them"
   1.1  I run scanner, it extracts signature hashes, stripped hashes
and looks if web-site version of signature matches normalized hash

4. "I want to see what people want to include in the next Python version"
   1.1  A call for proposals is made
   1.2  People place wannabe's into their stdlib dirs
   1.3  Crawl generates new functions on a web site
   1.4  Functions are categorized
   1.5  Optionally included / declined with a short one-liner reason - why
   1.6  Optionally provided with more detailed info why

--- feature creep cut ---
5. "I want to see what functions are popular in other languages"
   1.1  A separate crawler for Ruby, PHP etc. stdlib converts their
AST into compatible format where possible
   1.2  Submit to site stats

6. "I want to download the function in Ruby format"
   1.1  AST converter tries to do the job automatically where possible
   1.2  If it fails - you are encouraged to fix the converter rules or
write the replacement for this signature manually


Just an idea.
--
anatoly t.



More information about the Python-ideas mailing list