[Distutils] Proposal: Restrict the characters in a project name

James Carpenter nawkboy at gmail.com
Wed May 15 18:08:18 CEST 2013


While your at it, you might consider not allowing variation in case and
dash vs. underscore when specifying a dependency. A project should have
only one concrete name, without fuzziness.  A fuzzy match should result in
a match failure. Fuzzy matches for a manual search is a different thing.


On Wed, May 15, 2013 at 9:31 AM, Daniel Holth <dholth at gmail.com> wrote:

> How to avoid confusables.
>
> These scripts are recommended for use in identifiers:
> http://www.unicode.org/reports/tr31/#Table_Recommended_Scripts
>
> This report details a confusables detection algorithm:
> http://www.unicode.org/reports/tr39/#Confusable_Detection
>
> And ICU implements it:
> http://www.icu-project.org/apiref/icu4c/uspoof_8h.html (see also
> PyICU).
>
> The package index would enforce uniqueness of the "skeleton" of each
> registered package which is just an internal normalization based on
> confusability. if skeleton(identifier1) == skeleton(identifier2) then
> id1 and id2 are confusable.
>
> The tooling could get away with a simpler rule like
> re.sub("[^\w\d.]+", "_", distribution, re.UNICODE)
>
> As a bonus to including the world, this should be able to prevent
> people from exchanging zeroes for capital O.
>
> On Wed, May 15, 2013 at 7:17 AM, Eric V. Smith <eric at trueblade.com> wrote:
> > On 05/15/2013 07:10 AM, Donald Stufft wrote:
> >>>>> Anyone want to run a scan over the PyPI package set to see
> >>>>> how many packages would cause problems for a "[a-zA-Z0-9_.-]"
> >>>>> only filter?
> >>>>
> >>>> See my previous email where I did queries against my local DB.
> >>>> It's 225 total projects that wouldn't be allowed.
> >>>
> >>> Can you send the list of those projects?
> >>>
> >>> Eric.
> >>>
> >>
> >> Here you go https://gist.github.com/dstufft/5583225 used a Python
> >> oneliner and the PyPI API so others can reproduce easily if they
> >> wish.
> >
> > Perfect. Thanks.
> >
> > It looks like space causes most of the issues. I'm not sure how
> > "Twisted Flow >= 1.0" would be expected to parse.
> >
> > Eric.
> >
> >
> > _______________________________________________
> > Distutils-SIG maillist  -  Distutils-SIG at python.org
> > http://mail.python.org/mailman/listinfo/distutils-sig
> _______________________________________________
> Distutils-SIG maillist  -  Distutils-SIG at python.org
> http://mail.python.org/mailman/listinfo/distutils-sig
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20130515/f663e079/attachment-0001.html>


More information about the Distutils-SIG mailing list