[IronPython] Problems with 8-bit strings

Patrick Dubroy pdubroy at gmail.com
Wed Nov 21 21:18:30 CET 2007


Hi,

I've noticed that in the latest version of IronPython (2.0A6), I
noticed some weird behaviour with 8-bit strings:

    IronPython console: IronPython 2.0A6 (2.0.11102.00) on .NET 2.0.50727.1378
    Copyright (c) Microsoft Corporation. All rights reserved.
    >>> str("\x7e")
    '~'
    >>> str("\x7f")
    u'\x7f'
    >>> str("\x80")
    u'\x80'
    >>> str("\x81")
    Traceback (most recent call last):
      File , line 0, in ##23
      File mscorlib, line unknown, in GetString
      File mscorlib, line unknown, in GetChars
      File mscorlib, line unknown, in Fallback
      File mscorlib, line unknown, in Throw
    UnicodeDecodeError: Unable to translate bytes [81] at index 0 from
specified code page to Unicode.

The first problem is that if the string contains characters 127 (0x7F)
or 128 (0x80), str() will return a Unicode string rather than an 8-bit
string. CPython, on the other hand, returns a standard 8-bit string
for both of those cases. Then, if the string contains any bytes
greater than 128, it throws an exception. CPython, on the other hand,
is happy to have bytes up to 0xFF in an 8-bit string.

Is this a known issue? Should I open a bug?

Pat



More information about the Ironpython-users mailing list