Jython: How to import escaped Unicode and export utf-8?

Maurice Bauhahn bauhahnm at clara.net
Sun Apr 29 03:39:07 EDT 2001


I took the second issue to heart and tried to accommodate this by importing from a
text file u'\u17\u80'. The following is what resulted.

H:\jy>jython
Jython 2.1a1 on java1.2.2 (JIT: symcjit)
Type "copyright", "credits" or "license" for more information.
>>> execfile('h:\\jy\\teste.py')
Traceback (innermost last):
  File "<console>", line 1, in ?
UnicodeError: unicode escape decoding error: truncated \uXXXX

Hence, it appears that not only is it not possible to import \uXXXX, it is also
appears impossible to handle any Unicode escape above the first 256
characters...effectively ignoring Unicode altogether??? Is there something I am
missing?

Cheers,

Maurice

Maurice Bauhahn wrote:

> It appears that the problem is that programmers have killed a substantial part
> of the Unicode side of Jython. The README.txt file which accompanies Jython
> 2.1.a1
>
>     - Text files will pass data read and written through the default
>       codecs for the JVM. Binary files will write only the lower eight
>       bits of each unicode character.
>
>     - The \x escape have changed, now it will eat two hex characters
>       but never more. The behaviour matches CPython2.0
>
> Presumeably the first item is only referring to the default 'ASCII'...which
> can be changed. The second is, however, disasterous, if I understand it
> propeprly.
>
> Cheers,
>
> Maurice
>
> Martin von Loewis wrote:
>
> > Maurice Bauhahn <bauhahnm at clara.net> writes:
> >
> > > My imports of escaped Unicode (u'\u1780' or '\u1780') end up in my lists
> > > as:
> > >
> > > ["u'\\u1780'"]
> >
> > I very much doubt this. This looks more like the repr of a list,
> > instead of like the list itself. That could be an incompatibility of
> > repr for Unicode objects in Python, but I assume that the list is
> > still build correctly.
> >
> > > and .write as u'\u1780'.
> >
> > In CPython, that would give an exception. You cannot write a Unicode
> > object onto a stream without encoding it first.
> >
> > > From the command line I can get something useful by writing:
> > >
> > > u'\u1780'.encode('utf-8')
> > >
> > > but it does not appear to work within my jython script.
> >
> > That should work. How does it fail?
> >
> > Regards,
> > Martin
>
> --




More information about the Python-list mailing list