editing in Unicode
Roland Mas
mas at echo.fr
Thu Sep 7 12:50:33 EDT 2000
Bertilo Wennergren (2000-09-07 17:19:02 +0200) :
> "Roland Mas":
>
> > Hmm. Not sure, since you would have to explicitly state somewhere
> > that the "argument" to the u'' construct has to be considered as
> > encoded in UTF-8.
>
> Yes, that would be nice. Is there a way to state that somewhere?
Not that I know of.
> > Why UTF-8 and not -16, or Latin-1 or something else?
>
> Well, Latin-1 wouldn't do if I want to enter lots of characters that
> are not present in Latin 1.
From the user point of view, I entirely agree, I would prefer typing
my code in UTF-8 too. My remark was purely from the interpreter point
of view: how does it know what charset to expect in an u'' construct
if you don't specify it (and for now you cannot)?
> UTF-16 would do great, but would make the code harder to deal with
> in non-Unicode editors. UTF-8 is better since it is backwards
> compatible with ASCII.
Sure. Unfortunately neither is compatible with Latin-*, so there is
likely to be breakage anyway. Unless we¹ do add that way to specify
the u'' encoding.
An idea: u'' is UTF-8, U'' is UTF-16. Or u''8 and u''16, with one
of them equivalent to u''. But then again we have to keep backwards
compatibility with previously used Latin-* charsets.
Or not, since anyway the u'' construct is only used in
non-official-released-stable-production Pythons (except perhaps 1.6,
and I'm not sure so manyt people actually have used it intensively, or
will do so). Maybe we¹ could have that u'' thingy stable and
officially taking UTF-8 and/or -16 in Python 2.0?
Please, Python Lords?
Roland.
¹ Not including myself: I develop *in* Python, I don't develop *Python*.
--
Roland Mas
Sauvez un arbre, tuez un castor.
More information about the Python-list
mailing list