ConfigParser and Unicode

thehaas at binary.net thehaas at binary.net
Thu Mar 18 14:10:08 EST 2004


"Martin v. Löwis" <martin at v.loewis.de> wrote:
> thehaas at binary.net wrote:
> > Obviously, 'Grüß'!='Gr\xfc\xdf' .  

> It is not at all obvious that they are different. In fact, they
> are the same, assuming the second string is encoding in Latin-1.

> > Any ideas on how I can get the correct value?

> Pray tell: what is the correct value?

The correct value is 'Grüß', or at least have it equal to that.

Maybe I should back up -- I'm interfacing into a Windows API.  In that API, I see 'Grüß' as:
  >>> plist[-1].Reference
  u'Gr\xfc\xdf'

My value in goodProcList is:
  >>> goodProcRef[18]
  'Gr\xfc\xdf'

(yeah, goodProcList isn't in Unicode -- that's probably the cause of all this)

When I test their equality:

>>> goodProcRef[18] == plist[-1].Reference
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 2: ordinal
not in range(128)

If I try to manually encode goodProcRef[18], I get the same thing:

    >>> goodProcRef[18].encode('utf-8')
    Traceback (most recent call last):
       File "<stdin>", line 1, in ?
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xfc in position 2: ordinal not in range(128)

-- 
Mike Hostetler          
thehaas at binary.net 
http://www.binary.net/thehaas 



More information about the Python-list mailing list