[I18n-sig] Literal strings
Paul Prescod
paul@prescod.net
Sat, 03 Jun 2000 14:24:34 -0500
Peter Funk wrote:
>
> > I would like it to magically work with Unicode. Guido's proposal allows
> > it to magically work with Unicode-encoded ASCII, but not with the full
> > range of Unicode characters. I'm not entirely happy that my code will
> > crash and burn the first time someone pops in a cedilla.
>
> A cedilla (ç) is a normal 8-Bit character in ISO-Latin-1, so this may
> be a bad example.
Guido's proposal only auto-coerces 7-bit data.
> We use such literals a lot and it didn't break anything.
> Even with Guidos proposal it will only break things, if you coerce
> such a literal into unicode without an explicit conversion.
My code example showed an implicit coercion.
> There already was a long discussion about interpreter pragmas
> on python-dev. I still prefer David Scherer's brilliant idea to
> (ab)use the 'global' statment at module level, if we ever introduce
> pragmas into the 1.x series of Python. Please review the discussion
> (April 2000) in the python-dev archives.
I wasn't so concerned about the syntax so I didn't bother to look that
up.
> > Now I could go through my code and change all of the literals to Unicode
> > literals by hand, but
> >
> > a) that's really ugly, syntactically
>
> As always this is simply a matter of taste. And after a while you get
> used to it.
They say that about Perl too. :) I don't believe them.
> > b) I feel like I'll end up switching them all back when we just make
> > literal strings "wide" by default
>
> I don't believe that this will happen in the 1.x series. This would break
> just too many things and the memory penalty is just to harsh for small
> systems.
We will see about the former. The latter is just not true because a
Unicode object could be internally implemented as an 8-bit string as
long as it implements the same external interface. We have often
discussed these "tagged Unicode objects" and have just not implemented
them yet.
> > c) I feel like I'm being penalized for making my program
> > internationalized
>
> As long as your i18n effort doesn't hit asian languages (for example
> chinese, japanese) you can get away with narrow strings.
I work with XML so I don't know what language the input is in.
> Unicode only comes
> into play, if you have to deal with several different languages at
> the same time.
Or if you are dealing with XML, or TKinter, or WebDAV or communicating
with Java or ...
> > d) I have a lot of code, as we all do.
>
> If code can be modified automatically (and what you proposed can
> be done with a only slightly more elaborated operation than a simple
> 's/"/u"/g' replacement) this is IMO no argument.
Actually, I haven't had any experience with source to source Python
transforms myself. Wouldn't it mess up other things like comments and
tabbing unless you got to a great deal of work?
--
Paul Prescod - ISOGEN Consulting Engineer speaking for himself
Simplicity does not precede complexity, but follows it.
- http://www.cs.yale.edu/~perlis-alan/quotes.html