Using more than 7 bit ASCII on windows.

Sun Oct 29 16:26:24 EST 2000

On Sun, 29 Oct 2000 01:23:15 +0200, "Syver Enstad"
<syver.enstad at sensewave.com> wrote:

>"Mark Hammond" <MarkH at ActiveState.com> wrote in message
>news:39FA90D9.7070403 at ActiveState.com...
>>  > Is there anyone who uses Pythonwin with a different character set on
>>  > Windows? -- Japanese, Chinese, European and so on.
>
>> A few people - and they all have trouble :-(
>
>I read that extended characters should work in the interactive window on the
>active state page. But I can't seem to get it to work. Here are some
>examples from my interactive window (using python 2.0 with win32all 135 and
>the update you mentioned).
>
>(I keep most my kode under the folder: e:/våre dokumenter/kode/ (the 5th
>character should be an a with a ring over in case it doesn't display
>correctly on your screen.)
>
>>>> os.getcwd()
>'E:\\V\345re dokumenter'
>>>> os.chdir('/')
>>>> os.getcwd()
>'E:\\'
>>>> os.chdir('våre dokumenter')
>Traceback (innermost last):
>  File "<interactive input>", line 1, in ?
>OSError: [Errno 2] No such file or directory: 'v\303\245re dokumenter'
>
>The line above looks very strange to me as it seems that the a with a ring
>over is represented by two characters here, when it was only represented
>with one when calling getcwd
>
>>>> os.chdir('v\345re dokumenter')
>>>> os.getcwd()
>'E:\\v\345re dokumenter'
>
>The above is the workaround that gets me were I want.

Yes, the whole setup for non-ASCII characters seems to be very odd, if
not broken.

In my case, I am using the Latin-1 character set. If I have a
directory called '10£' (not a reasonable name, but OK as an example),
then I try to use it, all sorts of odd things happen:

    >>> s = '10£'
    >>> s
    '10\234'
    # Huh? OK, I see that £ isn't ASCII, and it looks like repr() is
    # returning an encoding-neutral form. Fair enough...
    >>> os.chdir('10£')
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    OSError: [Errno 2] No such file or directory: '10\234'
    # Well, yes - but that's what Python called it. I called it '10£',
    # and that *does* exist. And anyway, '10\234' is '10£', based
    # on the s example above.
    # Try Unicode
    >>> os.chdir(u'10£')
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    UnicodeError: ASCII encoding error: ordinal not in range(128)
    # Urk. So what should I have put to get a Unicode string
    # with the characters '1' '0' '£'?
    # I'm not planning on manually UTF-8 encoding this! :-(
    # Let's assume we need to use codecs - it's a lot of work...
    >>> import codecs
    >>> enc, dec, sr, sw  = codecs.lookup('latin1')
    >>> dec('10£')
    (u'10\234', 3)
    # That didn't get us very far...
    >>> e2, d2, r2, w2 = codecs.lookup('utf8')
    >>> e2('10£')
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    UnicodeError: ASCII decoding error: ordinal not in range(128)
    >>> d2('10£')
    Traceback (most recent call last):
      File "<stdin>", line 1, in ?
    UnicodeError: UTF-8 decoding error: unexpected code byte
    # Oh, stuff this. I give up :-(

So come on then. How should I use Latin-1 characters over 127 in my
code? As far as I can see, Unicode has made all of this *harder*, not
easier. Looks like the net result is that Latin-1 and the like are now
as hard as the multi-byte character sets, rather than making the
multi-byte stuff as easy as Latin-1.

Someone please tell me I'm wrong, and explain how I should have done
this. You'll need to convince me that the fact that

    >>> os.chdir('10£')

doesn't work is not a bug, first...

Paul.