unicode 3 digit decimal conversion

"Martin v. Löwis" martin at v.loewis.de
Sun Sep 28 16:03:45 EDT 2003


Rune Hansen wrote:

> Stalker told me to send the  letter "ø" as \248 or as xf8 (notice  the 
> missing "\"). At this point I'm sending 
> quoteattr(unicode('string',"iso-8859-1).encode("utf-8")) which is 
> neither of the above.(..?).

Correct: UTF-8 works differently. I find it surprising that anybody
actually proposes to send non-ASCII characters using xHH, as this
byte sequence my coincidently happen in ASCII text as well.

> Anyway, the  server is still happy, and the data views correctly in the 
> web interface.

It is relatively easy to recognize UTF-8 in the input; it is unlikely
that "real" data look like UTF-8 by coincidence (unlike \-escaping
or x-escaping). So it might be that the server studies the input to
guess the encoding. This is bad style, of course - the protocol should
be clear about encodings (this protocol couldn't be published in an
IETF RFC).

> Stalker provides a perl and java API for the telnet server.  I don't 
> read perl code very well, and the java API is distributed as .class 
> files(nothing new there, it's java after all) so I really don't know how 
> Stalker is handling it.

Even then, you could only find out what the perl and java clients do -
you couldn't tell, from that, what other options the server might support.

Regards,
Martin





More information about the Python-list mailing list