unicode strings and such

Martin von Loewis loewis at informatik.hu-berlin.de
Thu Sep 13 06:08:21 EDT 2001


Garth Grimm <garth_grimm at hp.com> writes:

> <!--$#-*-mode:python;tab-width:8;py-indent-offset:4;indent-tabs-mode:nil-*-

What kind of programming language is this? It is not Python, I can
tell that much. It looks like the language supports embedding Python,
though.

> a) Use UTF-8 encoding on the data file and use u'^ã??ã? ã?.ã?"$'
> notation in it.  This would create two-element tuples of unicode
> strings.

Since I don't know the programming language you are using, it is hard
to understand why putting UTF-8 in the first line might have any
effect. However, if the embedded Python text is passed to a Python
interpreter, I can tell you that the Unicode literal does *not* have
the desired effect - it is treated as a Latin-1 string. If this is
really UTF-8 for some Japanese text (which I cannot tell, just looking
at the bytes), you'd need to write

   unicode('^ã??ã? ã?.ã?"$', 'utf-8')

It's not clear to me why the str() call is needed; what happens if you
leave it out?

Regards,
Martin



More information about the Python-list mailing list