Py 3.3, unicode / upper()
wxjmfauth at gmail.com
wxjmfauth at gmail.com
Wed Dec 19 16:18:05 EST 2012
Le mercredi 19 décembre 2012 19:27:38 UTC+1, Ian a écrit :
> On Wed, Dec 19, 2012 at 8:40 AM, Chris Angelico <rosuav at gmail.com> wrote:
>
> > You may not be familiar with jmf. He's one of our resident trolls, and
>
> > he has a bee in his bonnet about PEP 393 strings, on the basis that
>
> > they take up more space in memory than a narrow build of Python 3.2
>
> > would, for a string with lots of BMP characters and one non-BMP. In
>
> > 3.2 narrow builds, strings were stored in UTF-16, with *surrogate
>
> > pairs* for non-BMP characters. This means that len() counts them
>
> > twice, as does string indexing/slicing. That's a major bug, especially
>
> > as your Python code will do different things on different platforms -
>
> > most Linux builds of 3.2 are "wide" builds, storing characters in four
>
> > bytes each.
>
>
>
> >From what I've been able to discern, his actual complaint about PEP
>
> 393 stems from misguided moral concerns. With PEP-393, strings that
>
> can be fully represented in Latin-1 can be stored in half the space
>
> (ignoring fixed overhead) compared to strings containing at least one
>
> non-Latin-1 character. jmf thinks this optimization is unfair to
>
> non-English users and immoral; he wants Latin-1 strings to be treated
>
> exactly like non-Latin-1 strings (I don't think he actually cares
>
> about non-BMP strings at all; if narrow-build Unicode is good enough
>
> for him, then it must be good enough for everybody). Unfortunately
>
> for him, the Latin-1 optimization is rather trivial in the wider
>
> context of PEP-393, and simply removing that part alone clearly
>
> wouldn't be doing anybody any favors. So for him to get what he
>
> wants, the entire PEP has to go.
>
>
>
> It's rather like trying to solve the problem of wealth disparity by
>
> forcing everyone to dump their excess wealth into the ocean.
----
latin-1 (iso-8859-1) ? are you sure ?
>>> sys.getsizeof('a')
26
>>> sys.getsizeof('ab')
27
>>> sys.getsizeof('aé')
39
Time to go to bed. More complete answer tomorrow.
jmf
More information about the Python-list
mailing list