Using more than 7 bit ASCII on windows.

Mon Oct 30 08:54:24 EST 2000

"Paul Moore" <paul.moore at uk.origin-it.com> wrote in message
news:JmL9OY9FJWKYSmZT5wV46STMeZkC at 4ax.com...
> On Mon, 30 Oct 2000 12:03:34 +0100, Paul Moore
> <paul.moore at uk.origin-it.com> wrote:
> >I can accept this. But I still don't know how to enter a literal
> >string containing a "£" character into Python. More explicitly, Python
> >accepts the line
> >
> >    >>> s = "£"
> >
> >But what does this *mean* (ie, what should I expect the semantics of s
> >to be?)
>
> Further investigation, based on some other posts I read, show:
>
> >>> unicode('£', 'latin-1')
> u'\234'

The interactive interpreter calls 'repr' on the objects
returned from expressions entered at the >>> prompt.  So,
this is what repr says of this Unicode object.

> >>> print unicode('£', 'latin-1')

Similarly, print calls 'str'.  So, the following is
an issue that str has:

> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeError: ASCII encoding error: ordinal not in range(128)
> >>> os.chdir(unicode('£', 'latin-1'))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeError: ASCII encoding error: ordinal not in range(128)
>
> I think this implies that I can get latin-1 data *into* Python, but
> that print doesn't like it...

print couldn't care less -- it delegates the transform-into-string
to str.  str applied to a Unicode object, in turn, wants _ASCII_
encoding, as the message on the UnicodeError is telling you.

os.chdir is apparently also "calling str" (the C API equivalent
thereof, no doubt, in both cases:-), with similar results.

It sure would be nice to have a way to control the default
encoding -- not have ASCII hard-wired.  There is a function
(not in the 2.0 docs -- doc error?), sys.getdefaultencoding(),
that _tells_ you what the default encoding is (and guess
which one...:-), but I do not know of a way to *change* it.

> Hmm. Codecs.
>
>     >>> e,d,sr,sw = codecs.lookup('latin-1')
>     >>> o = sw(sys.stdout)
>     >>> o.write(unicode('£','latin-1'))
>     £>>> print >> o, 'a'
>     a
>     >>> print >> o, unicode('£','latin-1')
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in ?
>     UnicodeError: ASCII encoding error: ordinal not in range(128)
>
> Huh? So o.write() works, but print >> o doesn't? When o is a stream
> writer specifically built to write in Latin-1?

The write method of file objects uses the "buffer interface" of
the objects it's writing out, I believe.  It most definitely
does NOT call 'str' (or the equivalent thereof), as print
does (wherever you're redirecting output, print always calls
str to prepare the bytes it wants to put out).

>     >>> print >> o, '£'
>      Traceback (most recent call last):
>       File "<stdin>", line 1, in ?
>       File "c:\applications\python20\lib\codecs.py", line 134, in
>       write
>         data, consumed = self.encode(object,self.errors)
>     UnicodeError: ASCII decoding error: ordinal not in range(128)
>
> Probably not relevant, '£' is somewhat undefined here, I guess.
>
> This is looking more and more bizarre...

Not sure what's going on here, but the other parts don't
seem bizarre at all to me -- I'm not 100% sure I am looking
at them the right way, but they sure to make sense the
way I look at them.

Alex