'Straße' ('Strasse') and Python 2

Michael Torrie torriem at gmail.com
Mon Jan 13 10:58:50 EST 2014


On 01/13/2014 02:54 AM, wxjmfauth at gmail.com wrote:
> Not at all. I'm afraid I'm understanding Python (on this
> aspect very well).

Are you sure about that?  Seems to me you're still confused as to the
difference between unicode and encodings.

> 
> Do you belong to this group of people who are naively
> writing wrong Python code (usually not properly working)
> during more than a decade?
> 
> 'ß' is the the fourth character in that text "Straße"
> (base index 0).
> 
> This assertions are correct (byte string and unicode).

How can they be?  They only are true for the default encoding and
character set you are using, which happens to have 'ß' as a single byte.
 Hence your little python 2.7 snippet is not using unicode at all, in
any form.  It's using a non-unicode character set.  There are methods
which can decode your character set to unicode and encode from unicode.
 But let's be clear.  Your byte streams are not unicode!

If the default byte encoding is UTF-8, which is a variable number of
bytes per character, your assertions are completely wrong.  Maybe it's
time you stopped programming in Windows and use OS X or Linux which
throw out the random single-byte character sets and instead provide a
UTF-8 terminal environment to support non-latin characters.

> 
>>>> sys.version
> '2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)]'
>>>> assert 'Straße'[4] == 'ß'
>>>> assert u'Straße'[4] == u'ß'
>>>>
> 
> jmf
> 
> PS Nothing to do with Py2/Py3.



More information about the Python-list mailing list