[SciPy-User] Central File Exchange for Scipy

Pauli Virtanen pav at iki.fi
Thu Apr 21 08:37:01 EDT 2011


Thu, 21 Apr 2011 03:07:29 -0500, Jason Grout wrote:
[clip]
>>> So when you say "Hosted software", are you thinking of a PyPi type of
>>> site, where the release tarball might be hosted on the site, rather
>>> than the development repository?
>>
>> Precisely so. The aim would be to make it less hassle to use than PyPi
>> for a relative Python newbie. (Although uploading packages to PyPi is
>> not hugely hassle-ful at the moment, as it is possible to do it using
>> only the web interface.) So there would be a bit of an overlap with
>> PyPi; one could however add some recommendations etc. to push people to
>> use PyPi, if they are willing to jump through some extra hoops.
> 
> What extra hoops?

Mainly, for small pieces of code, you might not even want to create a 
named Python package. So some form of code hosting seems to be useful --- 
whether it is called "snippets" (multi-file) or "hosted projects". (The 
wiki-style Cookbook content is then the third category of items that 
could be useful to have.)


Re: PyPi usability

You cannot just upload a .py file onto PyPi. PyPi checks that the 
uploaded file (i) is a tarball, zip, or egg, and (ii) is named in a 
specific way, and, (iii) you need to go to a different site.

So there are user experience issues with the web upload. Sure, the upload 
is manageable once you practice a bit, and many of the issues are 
probably fixable.

[clip]
>> At the moment, one thing seems clear:
>>
>>    - Pointers to externally hosted projects (&  semi-automatic
>>      import from PyPi)
>
> I just read up more on PyPi, and in particular, read up on their recent
> discussion which led to the disabling of the rating system [1].  I'm not
> convinced that this point is clear.  How is our pointing to packages on
> PyPi and elsewhere improving on PyPi?  Are there a number of other
> packages out there that are not cataloged on PyPi and which should not
> be cataloged there?

By the "clear" thing, I mean pointing to packages on PyPi (= almost all 
externally hosted projects), augmented with community tags etc. Pointing 
to packages outside PyPi is not essential --- but would be easy to add.

One thing running for allowing "external" links is that adding packages 
to PyPi can only be done by their authors. There is currently a small 
number of relevant packages usable from Python that are not on PyPi 
(although they should); for example PyTrilinos. But I guess it should be 
possible to browbeat their authors to add a PyPi entry.

> I could see us adding value by having a better tagging system that was
> customized more for scientific software.  On the other hand, maybe we
> could just improve the PyPi entries for such software so that a keyword
> search would pull up the packages.

It seems clear to me that this feature would be useful. Also, combining 
the PyPi data with smaller code snippets would create a one-stop-shop.

As you can surmise from the discussion you linked to, there is resistance 
in adding new community-oriented features to PyPi itself, as some people 
feel that such features are out-of-scope for it. Doing it externally also 
makes sense from the usability and branding point of view --- a site 
called "Python in science" with filtered package selection can be more 
convincing and convenient to navigate than browsing "Topic :: Scientific/
Engineering" on PyPi.

The discussion on rating systems there is an useful read --- it's why I 
left out any star-based rating systems so far. Just adding "I use this" 
popularity measure probably works around most issues. (The PyPi download 
data is not a very reliable measure, as many of the bigger packages host 
their files externally.)

    ***

On improving the entries on PyPi -- the keywords etc. there are editable 
only by the original submitter, and this will probably not change, so I 
don't think that will be a possible way to go.

The PyPi package classifiers as they are now are assigned solely by the 
package authors, more or less at random and from a limited selection, and 
are not very reliable. There are several packages in the "Topic :: 
Scientific/Engineering" category that don't actually have much focus on 
either science or engineering. So, some filtering of PyPi entries would 
already be useful.

> > But the following are not so clear:
> >
> >    - Hosted projects -- how much to overlap with PyPi?
> 
> If it's easy to host on PyPi, it seems like we should point people over
> there.  We have far fewer users (and infrastructure maintainers) than
> PyPi, and PyPi itself already has the authoritative blessing of Python.

For small contributions, you might not want to use PyPi. Aside from that, 
the hosted projects are not really required, provided PyPi is easy enough 
to use (which is not true for the setup.py way, but may be true for the 
web interace).

> >    - Snippets -- the Wiki or the Knol? Or both? How much overlap with
> >      hosted projects?
> 
> The python snippet repository is:
> http://code.activestate.com/recipes/langs/python/
[clip]

The activestate snippet library seems not to be very actively used for 
Scipy et al. at the moment -- there are only ~15 recipes tagged with 
"scipy", "numpy", "scientific", "sage", or "science".

Also:

- Since it's not a focused site, relevant code snippets are mixed 
  with non-relevant ones, including ones written in languages other 
  than Python.

- As tags are specified by the users, and free-form, stuff will
  be lost in the midst of non-relevant content.

- The tagging feature could perhaps be improved --- it appears they
  can only be assigned by the author of the snippet.

- There are some usability problems: e.g. clicking the "Tags" link on 
  the top takes you away from Python-specific content.

- The search feature is not especially good: it's just Google's site:
  search, so it does not explicitly know about tags or metadata.

Other than that, it seems to do a reasonable work.

[clip]
> So: thoughts on the scope of this new project, and how it differentiates
> enough from the existing sites to be useful enough to build and
> maintain?

Already focusing on "scientific" content is a differentiation big enough, 
IMHO. It's mostly a social question of creating a hub for exchanging this 
type of content; and also a question of branding. Technically, sure, 
there is not so much new under the sun. The first point would just to be 
to make the implementation slick and useful enough to attract people to a 
single place. The second point would be to provide a one-stop-shop for 
whatever you need related to Python in science --- which would have the 
extra benefit of showcasing that it is doing well, and is a credible tool 
for many purposes.

At least based on earlier discussions on this list, it seems that at 
least the people who chimed in would prefer such a central hub over what 
is currently available.

The current situation is, if I want to share a something science-related 
written in Python, it is not obvious where I should put it so that there 
would be some audience. For small contributions, in generic snippet sites 
your stuff gets easily lost in the middle of non-relevant content --- 
also, I'm not convinced many people (e.g. those on this mailing list) 
follow those. The scipy.org/Cookbook is also not very usable, as it's a 
generic wiki. For larger contributions, PyPi works (although it's mildly 
clumsy to use), but it does not offer much visibility. If I name my 
package as scikits.* it goes to scikits.appspot.com, but I guess that's 
not very widely used either.

So that's the motivation. The snippet/hosted-projects part alone would 
address a part of what is missing, but I think one might as well go and 
make a one-stop-shop out of it.

Best,
Pauli




More information about the SciPy-User mailing list