editing in Unicode
Bertilo Wennergren
bertilow at hem.passagen.se
Thu Sep 7 06:49:10 EDT 2000
Neil Hodgson:
(Thanks for the snappy answer!)
> Bertilo Wennergren wrote:
> > What if I want to edit my Python code directly in a Unicode text
> > editor that can display all characters I want to use, and that can
> > save the code in utf-8 or utf-16? How do I write my text strings
> > so the compiler gets it right?
> First get a hold of a Unicode capable editor.
That I already have.
> [...]
> You can now write
> x = "@"
> If you now look at contents of x it should be '\320\271', the UTF-8
> representation of the mentioned character. Python strings are really
> byte-buffers - there is no encoding value associated with each string
> although they will most commonly contain ASCII strings. To convert this to
a
> Unicode string use the unicode built in function:
> y = unicode(x,"UTF8")
Is there no way of avoiding this additional step, getting Python to always
automatically treat all strings as UTF-8 encoded Unicode strings? If I need
a lot of Unicode text strings it's a big bother to always have to explicitly
convert each and every one of them. A possible source of bugs, I'd say...
> The second argument is the encoding that the first argument is in.
"UTF8"
> is supposed to be the default for the second argument so it should be
> possible to omit it but that appears to not work in the version
> (ActivePython based on 1.6 beta) I am using.
I'll try this in a newer version.
> If doing this in real code, its
> more likely you'd collapse the code down to:
> msg=unicode("@#&", "UTF8")
If I get this right the following simpler version ought to work:
msg=unicode("@#&")
Right? That I could live with.
What about using this:
msg = u'@#&'
?
> Unfortunately most Python libraries do not yet accept Unicode strings,
> even the win32* modules which should be enabled for wide strings.
:-(
> I'm thinking of writing an editing mode for PythonWin and SciTE that
maps
> \u escape sequences to/from the correct glyphs so you would be able to see
> and write
>
> msg=L"@"
>
> which would directly create a Unicode string. This would be converted
> from \u sequences on input and back to \u sequences on output. A benefit
of
> this is that the resulting files would be sensibly editable with ASCII
only
> editors.
Great idea. I think my Unicode editor (UniRed) can already do this (or can
be made to do it with minimal fiddling).
--
#####################################################################
Bertilo Wennergren
<http://purl.oclc.org/net/bertilo>
<bertilow at hem.passagen.se>
#####################################################################
More information about the Python-list
mailing list