Mistake or Troll (was Re: 'Straße' ('Strasse') and Python 2)

Terry Reedy tjreedy at udel.edu
Mon Jan 13 18:05:04 EST 2014


On 1/13/2014 4:54 AM, wxjmfauth at gmail.com wrote:

> I'm afraid I'm understanding Python (on this
> aspect very well).

Really?

> Do you belong to this group of people who are naively
> writing wrong Python code (usually not properly working)
> during more than a decade?

To me, the important question is whether this and previous similar posts 
are intentional trolls designed to stir up the flurry of responses they 
get or 'innocently' misleading or even erroneous. If your claim of 
understanding Python and Unicode is true, then this must be a troll 
post. Either way, please desist, or your access to python-list from 
google-groups may be removed.

> 'ß' is the the fourth character in that text "Straße"
> (base index 0).

As others have said, in the *unicode text "Straße", 'ß' is the fifth 
character, at character index 4, ...

> This assertions are correct (byte string and unicode).

whereas, when the text is encoded into bytes, the byte index depends on 
the encoding and the assertion that it is always 4 is incorrect. Did you 
know this or were you truly ignorant?

>>>> sys.version
> '2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)]'
>>>> assert 'Straße'[4] == 'ß'

Sometimes true, sometimes not.

>>>> assert u'Straße'[4] == u'ß'

> PS Nothing to do with Py2/Py3.

This issue has everything to do with Py2, where 'Straße' is encoded 
bytes, versus Py3, where 'Straße' is unicode text where each character 
of that word takes one code unit, whether each is 2 bytes or 4 bytes.

If you replace 'ß' with any astral (non-BMP) character, this issue 
appears even for unicode text in 3.2-, where an astral character 
requires 2, not 1, code units on narrow builds, thereby screwing up 
indexing, just as can happen for encoded bytes. In 3.3+, all characters 
use 1 code unit and indexing (and slicing) always works properly. This 
is another unicode issue where you appear not to understand, but might 
just be trolling.

-- 
Terry Jan Reedy






More information about the Python-list mailing list