editing in Unicode

Roland Mas mas at echo.fr
Thu Sep 7 12:50:33 EDT 2000


Bertilo Wennergren (2000-09-07 17:19:02 +0200) :

> "Roland Mas":
> 
> >   Hmm.  Not sure, since you would have to explicitly state somewhere
> > that the "argument" to the u'' construct has to be considered as
> > encoded in UTF-8.
> 
> Yes, that would be nice. Is there a way to state that somewhere?

Not that I know of.

> >  Why UTF-8 and not -16, or Latin-1 or something else?
> 
> Well, Latin-1 wouldn't do if I want to enter lots of characters that 
> are not present in Latin 1.

  From the user point of view, I entirely agree, I would prefer typing
my code in UTF-8 too.  My remark was purely from the interpreter point
of view: how does it know what charset to expect in an u'' construct
if you don't specify it (and for now you cannot)?

> UTF-16 would do great, but would make the code harder to deal with
> in non-Unicode editors. UTF-8 is better since it is backwards
> compatible with ASCII.

  Sure.  Unfortunately neither is compatible with Latin-*, so there is
likely to be breakage anyway.  Unless we¹ do add that way to specify
the u'' encoding.

  An idea: u'' is UTF-8, U'' is UTF-16.  Or u''8 and u''16, with one
of them equivalent to u''.  But then again we have to keep backwards
compatibility with previously used Latin-* charsets.

  Or not, since anyway the u'' construct is only used in
non-official-released-stable-production Pythons (except perhaps 1.6,
and I'm not sure so manyt people actually have used it intensively, or
will do so).  Maybe we¹ could have that u'' thingy stable and
officially taking UTF-8 and/or -16 in Python 2.0?

  Please, Python Lords?

Roland.

¹ Not including myself: I develop *in* Python, I don't develop *Python*.
-- 
Roland Mas

Sauvez un arbre, tuez un castor.



More information about the Python-list mailing list