A Unicode problem -HELP
Tim Roberts
timr at probo.com
Wed May 17 02:12:29 EDT 2006
"manstey" <manstey at csu.edu.au> wrote:
>
>I have done more reading on unicode and then tried my code in IDLE
>rather than WING IDE, and discovered that it works fine in IDLE, so I
>think WING has a problem with unicode.
Rather, its output defaults to ASCII.
>So, assuming I now work in IDLE, all I want help with is how to read in
>an ascii string and convert its letters to various unicode values and
>save the resulting 'string' to a utf-8 text file. Is this clear?
>
>so in pseudo code
>1. F is converted to \u0254, $ is converted to \u0283, C is converted
>to \u02A6\02C1, etc.
>(i want to do this using a dictionary TRANSLATE={'F':u'\u0254', etc)
>2. I read in a file with lines like:
>F$
>FCF$
>$$C$ etc
>3. I convert this to
>\u0254\u0283
>\u0254\u02A6\02C1\u0254 etc
>4. i save the results in a new file
>
>when i read the new file in a unicode editor (EmEditor), i don't see
>\u0254\u02A6\02C1\u0254, but I see the actual characters (open o, esh,
>ts digraph, modified letter reversed glottal stop, etc.
Of course. Isn't that exactly what you wanted? The Python string
u"\u0254" contains one character (Latin small open o). It does NOT contain
6 characters. If you write that to a file, that file will contain 1
character -- 2 bytes.
If you actually want the 6-character string \u0254 written to a file, then
you need to escape the \u special code: "\\u0254". However, I don't see
what good that would do you. The \u escape is a Python source code thing.
>I'm sure this is straightforward but I can't get it to work.
I think it is working exactly as you want.
--
- Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.
More information about the Python-list
mailing list