[I18n-sig] Literal strings

Paul Prescod paul@prescod.net
Sat, 03 Jun 2000 14:24:34 -0500


Peter Funk wrote:
> 
> > I would like it to magically work with Unicode. Guido's proposal allows
> > it to magically work with Unicode-encoded ASCII, but not with the full
> > range of Unicode characters. I'm not entirely happy that my code will
> > crash and burn the first time someone pops in a cedilla.
> 
> A cedilla (ç) is a normal 8-Bit character in ISO-Latin-1, so this may
> be a bad example.  

Guido's proposal only auto-coerces 7-bit data.

> We use such literals a lot and it didn't break anything.
> Even with Guidos proposal it will only break things, if you coerce
> such a literal into unicode without an explicit conversion.

My code example showed an implicit coercion.

> There already was a long discussion about interpreter pragmas
> on python-dev.  I still prefer David Scherer's brilliant idea to
> (ab)use the 'global' statment at module level, if we ever introduce
> pragmas into the 1.x series of Python.  Please review the discussion
> (April 2000) in the python-dev archives.

I wasn't so concerned about the syntax so I didn't bother to look that
up.

> > Now I could go through my code and change all of the literals to Unicode
> > literals by hand, but
> >
> >  a) that's really ugly, syntactically
> 
> As always this is simply a matter of taste.  And after a while you get
> used to it.

They say that about Perl too. :) I don't believe them.

> >  b) I feel like I'll end up switching them all back when we just make
> > literal strings "wide" by default
> 
> I don't believe that this will happen in the 1.x series.  This would break
> just too many things and the memory penalty is just to harsh for small
> systems.

We will see about the former. The latter is just not true because a
Unicode object could be internally implemented as an 8-bit string as
long as it implements the same external interface. We have often
discussed these "tagged Unicode objects" and have just not implemented
them yet.

> >  c) I feel like I'm being penalized for making my program
> > internationalized
> 
> As long as your i18n effort doesn't hit asian languages (for example
> chinese, japanese) you can get away with narrow strings.  

I work with XML so I don't know what language the input is in.

> Unicode only comes
> into play, if you have to deal with several different languages at
> the same time.

Or if you are dealing with XML, or TKinter, or WebDAV or communicating
with Java or ...

> >  d) I have a lot of code, as we all do.
> 
> If code can be modified automatically (and what you proposed can
> be done with a only slightly more elaborated operation than a simple
> 's/"/u"/g' replacement) this is IMO no argument.

Actually, I haven't had any experience with source to source Python 
transforms myself. Wouldn't it mess up other things like comments and 
tabbing unless you got to a great deal of work?
-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
Simplicity does not precede complexity, but follows it. 
	- http://www.cs.yale.edu/~perlis-alan/quotes.html