UTF-8 in source code (Re: [Python-Dev] Internationalization Toolkit)

M.-A. Lemburg mal@lemburg.com
Fri, 12 Nov 1999 10:27:29 +0100


Tim Peters wrote:
> 
> [MAL]
> > ...
> > The conversion goes as follows:
> > · for single characters (and this includes all \XXX sequences
> >   except \uXXXX), take the ordinal and interpret it as Unicode
> >   ordinal for \uXXXX sequences, insert the Unicode character
> >   with ordinal 0xXXXX instead
> 
> Perfect!

Thanks :-)
 
> [about "raw" Unicode strings]
> > ...
> > Not sure whether we really need to make this even more complicated...
> > The \uXXXX strings look ugly, adding a few \\\\ for e.g. REs or
> > filenames won't hurt much in the context of those \uXXXX monsters :-)
> 
> Alas, this won't stand over the long term.  Eventually people will write
> Python using nothing but Unicode strings -- "regular strings" will
> eventurally become a backward compatibility headache <0.7 wink>.  IOW,
> Unicode regexps and Unicode docstrings and Unicode formatting ops ...
> nothing will escape.  Nor should it.
> 
> I don't think it all needs to be done at once, though -- existing languages
> usually take years to graft in gimmicks to cover all the fine points.  So,
> happy to let raw Unicode strings pass for now, as a relatively minor point,
> but without agreeing it can be ignored forever.

Agreed... note that you could also write your own codec for just this
reason and then use:

u = unicode('....\u1234...\...\...','raw-unicode-escaped')

Put that into a function called 'ur' and you have:

u = ur('...\u4545...\...\...')

which is not that far away from ur'...' w/r to cosmetics.

> > ...
> > BTW, if you want to type in UTF-8 strings and have them converted
> > to Unicode, you can use the standard:
> >
> > u = unicode('...string with UTF-8 encoded characters...','utf-8')
> 
> That's what I figured, and thanks for the confirmation.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Y2000:                                                    49 days left
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/