unicode strings and such
Martin von Loewis
loewis at informatik.hu-berlin.de
Thu Sep 13 06:08:21 EDT 2001
Garth Grimm <garth_grimm at hp.com> writes:
> <!--$#-*-mode:python;tab-width:8;py-indent-offset:4;indent-tabs-mode:nil-*-
What kind of programming language is this? It is not Python, I can
tell that much. It looks like the language supports embedding Python,
though.
> a) Use UTF-8 encoding on the data file and use u'^ã??ã? ã?.ã?"$'
> notation in it. This would create two-element tuples of unicode
> strings.
Since I don't know the programming language you are using, it is hard
to understand why putting UTF-8 in the first line might have any
effect. However, if the embedded Python text is passed to a Python
interpreter, I can tell you that the Unicode literal does *not* have
the desired effect - it is treated as a Latin-1 string. If this is
really UTF-8 for some Japanese text (which I cannot tell, just looking
at the bytes), you'd need to write
unicode('^ã??ã? ã?.ã?"$', 'utf-8')
It's not clear to me why the str() call is needed; what happens if you
leave it out?
Regards,
Martin
More information about the Python-list
mailing list