Flexible string representation, unicode, typography, ...

Ian Kelly ian.g.kelly at gmail.com
Sun Sep 2 05:06:16 EDT 2012


On Sun, Sep 2, 2012 at 1:36 AM,  <wxjmfauth at gmail.com> wrote:
> I still remember my thoughts when I read the PEP 393
> discussion: "this is not logical", "they do no understand
> typography", "atomic character ???", ...

That would indicate one of two possibilities.  Either:

1) Everybody in the PEP 393 discussion except for you is clueless
about how to implement a Unicode type; or

2) You are clueless about how to implement a Unicode type.

Taking into account Occam's razor, and also that you seem to be unable
or unwilling to offer a solid rationale for those thoughts, I have to
say that I'm currently leaning toward the second possibility.


> Real world exemples.
>
>>>> import libfrancais
>>>> li = ['noël', 'noir', 'nœud', 'noduleux', \
> ...     'noétique', 'noèse', 'noirâtre']
>>>> r = libfrancais.sortfr(li)
>>>> r
> ['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir',
> 'noirâtre']

libfrancais does not appear to be publicly available.  It's not listed
in PyPI, and googling for "python libfrancais" turns up nothing
relevant.

Rewriting the example to use locale.strcoll instead:

>>> li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'French_France')
'French_France.1252'
>>> import functools
>>> sorted(li, key=functools.cmp_to_key(locale.strcoll))
['noduleux', 'noël', 'noèse', 'noétique', 'nœud', 'noir', 'noirâtre']

# Python 3.2
>>> import timeit
>>> timeit.repeat("sorted(li, key=functools.cmp_to_key(locale.strcoll))", "import functools; import locale; li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']", number=10000)
[0.5544277025009592, 0.5370117249557325, 0.5551836677925053]

# Python 3.3
>>> import timeit
>>> timeit.repeat("sorted(li, key=functools.cmp_to_key(locale.strcoll))", "import functools; import locale; li = ['noël', 'noir', 'nœud', 'noduleux', 'noétique', 'noèse', 'noirâtre']", number=10000)
[0.1421166788364303, 0.12389078130001963, 0.13184190553613462]

As you can see, Python 3.3 is about 77% faster than Python 3.2 on this
example.  If this was intended to show that the Python 3.3 Unicode
representation is a regression over the Python 3.2 implementation,
then it's a complete failure as an example.



More information about the Python-list mailing list