Using more than 7 bit ASCII on windows.
Paul Moore
gustav at morpheus.demon.co.uk
Sun Oct 29 16:26:24 EST 2000
On Sun, 29 Oct 2000 01:23:15 +0200, "Syver Enstad"
<syver.enstad at sensewave.com> wrote:
>"Mark Hammond" <MarkH at ActiveState.com> wrote in message
>news:39FA90D9.7070403 at ActiveState.com...
>> > Is there anyone who uses Pythonwin with a different character set on
>> > Windows? -- Japanese, Chinese, European and so on.
>
>> A few people - and they all have trouble :-(
>
>I read that extended characters should work in the interactive window on the
>active state page. But I can't seem to get it to work. Here are some
>examples from my interactive window (using python 2.0 with win32all 135 and
>the update you mentioned).
>
>(I keep most my kode under the folder: e:/våre dokumenter/kode/ (the 5th
>character should be an a with a ring over in case it doesn't display
>correctly on your screen.)
>
>>>> os.getcwd()
>'E:\\V\345re dokumenter'
>>>> os.chdir('/')
>>>> os.getcwd()
>'E:\\'
>>>> os.chdir('våre dokumenter')
>Traceback (innermost last):
> File "<interactive input>", line 1, in ?
>OSError: [Errno 2] No such file or directory: 'v\303\245re dokumenter'
>
>The line above looks very strange to me as it seems that the a with a ring
>over is represented by two characters here, when it was only represented
>with one when calling getcwd
>
>>>> os.chdir('v\345re dokumenter')
>>>> os.getcwd()
>'E:\\v\345re dokumenter'
>
>The above is the workaround that gets me were I want.
Yes, the whole setup for non-ASCII characters seems to be very odd, if
not broken.
In my case, I am using the Latin-1 character set. If I have a
directory called '10£' (not a reasonable name, but OK as an example),
then I try to use it, all sorts of odd things happen:
>>> s = '10£'
>>> s
'10\234'
# Huh? OK, I see that £ isn't ASCII, and it looks like repr() is
# returning an encoding-neutral form. Fair enough...
>>> os.chdir('10£')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
OSError: [Errno 2] No such file or directory: '10\234'
# Well, yes - but that's what Python called it. I called it '10£',
# and that *does* exist. And anyway, '10\234' is '10£', based
# on the s example above.
# Try Unicode
>>> os.chdir(u'10£')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeError: ASCII encoding error: ordinal not in range(128)
# Urk. So what should I have put to get a Unicode string
# with the characters '1' '0' '£'?
# I'm not planning on manually UTF-8 encoding this! :-(
# Let's assume we need to use codecs - it's a lot of work...
>>> import codecs
>>> enc, dec, sr, sw = codecs.lookup('latin1')
>>> dec('10£')
(u'10\234', 3)
# That didn't get us very far...
>>> e2, d2, r2, w2 = codecs.lookup('utf8')
>>> e2('10£')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeError: ASCII decoding error: ordinal not in range(128)
>>> d2('10£')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeError: UTF-8 decoding error: unexpected code byte
# Oh, stuff this. I give up :-(
So come on then. How should I use Latin-1 characters over 127 in my
code? As far as I can see, Unicode has made all of this *harder*, not
easier. Looks like the net result is that Latin-1 and the like are now
as hard as the multi-byte character sets, rather than making the
multi-byte stuff as easy as Latin-1.
Someone please tell me I'm wrong, and explain how I should have done
this. You'll need to convince me that the fact that
>>> os.chdir('10£')
doesn't work is not a bug, first...
Paul.
More information about the Python-list
mailing list