Jython: How to import escaped Unicode and export utf-8?

Martin von Loewis loewis at informatik.hu-berlin.de
Mon Apr 30 10:22:02 EDT 2001


Maurice Bauhahn <bauhahnm at clara.net> writes:

> It appears that the problem is that programmers have killed a
> substantial part of the Unicode side of Jython. The README.txt file
> which accompanies Jython 2.1.a1

I can't see why anything has been killed, here.

>     - Text files will pass data read and written through the default
>       codecs for the JVM. Binary files will write only the lower eight
>       bits of each unicode character.

Sure, this is almost the same as CPython: writing a Unicode object to
a file will encode it with the default encoding. Writing a byte string
to a file will write the bytes.

Since Jython uses Java strings both for Unicode and byte strings, it
has an "extra" byte in each element of a byte string, which is not
written to the file.

In any case, you have to encode Unicode data with an explicit encoding
before writing them to files.

>     - The \x escape have changed, now it will eat two hex characters
>       but never more. The behaviour matches CPython2.0
> 
> Presumeably the first item is only referring to the default 'ASCII'...which
> can be changed. The second is, however, disasterous, if I understand it
> propeprly.

I think you don't understand it properly. Why is it disasterous?

Regards,
Martin



More information about the Python-list mailing list