[Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons)

Guido van Rossum guido@python.org
Wed, 05 Apr 2000 10:16:15 -0400


> Sigh.  In our company we use 'german' as our master language so 
> we have string literals containing iso-8859-1 umlauts all over the place.  
> Okay as long as we don't mix them with Unicode objects, this doesn't 
> hurt anybody.
> 
> What I would love to see, would be a well defined way to tell the
> interpreter to use 'latin-1' as default encoding instead of 'UTF-8'
> when dealing with string literals from our modules.

It would be better if this was supported for u"..." literals, so that
it was taken care of at the source code level completely.  The running
program shouldn't have to worry about what encoding its source code
was!

For 8-bit literals, this would mean that if you had source code using
Latin-1, the literals would be translated from Latin-1 to UTF-8 by the
code generator.  This would mean that len('ç') would return 2.  I'm
not sure this is a great idea -- but then I'm not sure that using
Latin-1 in source code is a great idea either.

> The tokenizer in Python 1.6 already contains smart logic to get the
> size of TABs right (pasting from tokenizer.c):
> 
>         /* Skip comment, while looking for tab-setting magic */
>         if (c == '#') {
>                 static char *tabforms[] = {
>                         "tab-width:",           /* Emacs */
>                         ":tabstop=",            /* vim, full form */
>                         ":ts=",                 /* vim, abbreviated form */
>                         "set tabsize=",         /* will vi never die? */
>                 /* more templates can be added here to support other editors */
>                 };
> ..
> 
> It wouldn't be to hard to add something there to recognize
> other "pragma" comments like for example:
> 	#content-transfer-encoding: iso-8859-1
> But what to do with it?  May be adding a default encoding to every string
> object?  Is this bloat?  Just an idea.

Before we go any further we should design pragmas.  The current
approach is inefficient and only designed to accommodate
editor-specific magical commands.

I say it's a Python 1.7 issue.

--Guido van Rossum (home page: http://www.python.org/~guido/)