Using more than 7 bit ASCII on windows.
Alex Martelli
aleaxit at yahoo.com
Mon Oct 30 08:54:24 EST 2000
"Paul Moore" <paul.moore at uk.origin-it.com> wrote in message
news:JmL9OY9FJWKYSmZT5wV46STMeZkC at 4ax.com...
> On Mon, 30 Oct 2000 12:03:34 +0100, Paul Moore
> <paul.moore at uk.origin-it.com> wrote:
> >I can accept this. But I still don't know how to enter a literal
> >string containing a "£" character into Python. More explicitly, Python
> >accepts the line
> >
> > >>> s = "£"
> >
> >But what does this *mean* (ie, what should I expect the semantics of s
> >to be?)
>
> Further investigation, based on some other posts I read, show:
>
> >>> unicode('£', 'latin-1')
> u'\234'
The interactive interpreter calls 'repr' on the objects
returned from expressions entered at the >>> prompt. So,
this is what repr says of this Unicode object.
> >>> print unicode('£', 'latin-1')
Similarly, print calls 'str'. So, the following is
an issue that str has:
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeError: ASCII encoding error: ordinal not in range(128)
> >>> os.chdir(unicode('£', 'latin-1'))
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeError: ASCII encoding error: ordinal not in range(128)
>
> I think this implies that I can get latin-1 data *into* Python, but
> that print doesn't like it...
print couldn't care less -- it delegates the transform-into-string
to str. str applied to a Unicode object, in turn, wants _ASCII_
encoding, as the message on the UnicodeError is telling you.
os.chdir is apparently also "calling str" (the C API equivalent
thereof, no doubt, in both cases:-), with similar results.
It sure would be nice to have a way to control the default
encoding -- not have ASCII hard-wired. There is a function
(not in the 2.0 docs -- doc error?), sys.getdefaultencoding(),
that _tells_ you what the default encoding is (and guess
which one...:-), but I do not know of a way to *change* it.
> Hmm. Codecs.
>
> >>> e,d,sr,sw = codecs.lookup('latin-1')
> >>> o = sw(sys.stdout)
> >>> o.write(unicode('£','latin-1'))
> £>>> print >> o, 'a'
> a
> >>> print >> o, unicode('£','latin-1')
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> UnicodeError: ASCII encoding error: ordinal not in range(128)
>
> Huh? So o.write() works, but print >> o doesn't? When o is a stream
> writer specifically built to write in Latin-1?
The write method of file objects uses the "buffer interface" of
the objects it's writing out, I believe. It most definitely
does NOT call 'str' (or the equivalent thereof), as print
does (wherever you're redirecting output, print always calls
str to prepare the bytes it wants to put out).
> >>> print >> o, '£'
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> File "c:\applications\python20\lib\codecs.py", line 134, in
> write
> data, consumed = self.encode(object,self.errors)
> UnicodeError: ASCII decoding error: ordinal not in range(128)
>
> Probably not relevant, '£' is somewhat undefined here, I guess.
>
> This is looking more and more bizarre...
Not sure what's going on here, but the other parts don't
seem bizarre at all to me -- I'm not 100% sure I am looking
at them the right way, but they sure to make sense the
way I look at them.
Alex
More information about the Python-list
mailing list