PEP-0263 and default encoding

Martin v. Löwis martin at v.loewis.de
Thu Oct 2 14:31:49 EDT 2003


bokr at oz.net (Bengt Richter) writes:

> How about letting someone in Klaus' situation be explicit in another
> way? E.g.,

>     python -e iso-8859-1 the_unmarked_source.py

What would the exact meaning of this command line option be?

> Hm, I guess to be consistent you would have to have some way to pass
> -e info into any text-file-opening context, e.g., import, execfile,
> file, open, etc.

Ah, so it should probably apply only to the file passed to Python on
the command line - some people might think it would apply to all
files, though.

> In such case, you'd want a default. Maybe it could come from site.py
> with override by python -e, again with override by actual individual
> file-embedded encoding info.

This shows the problem of this approach: Now it becomes hidden in
site.py, and, as soon as you move the code to a different machine, the
problems come back.

> >1. You are have problem with existing code, and you are annoyed
> >   by the warning. Just silence the warning in site.py.
> That's not the same as giving a proper encoding interpretation, is it?
> (Though in comments it wouldn't matter much).

No. However, it would restore the meaning that the code has in 2.2:
For comments and byte string literals, it would be the "as-is"
encoding; for Unicode literals, the interpretation would be latin-1.

> >2. You are writing new code, and you are annoyed by the encoding
> >   declaration. Just save your code as UTF-8, using the UTF-8 BOM.
> YMMV with the editor you are using though, right?

Somewhat, yes. However, I expect that most editors which
specifically support Python also support PEP-263, sooner or later.

> Hm2, is all internal text representation going to wind up wchar at
> some point?

It appears that string literals will continue to denote byte strings
for quite some time. There is the -U option, so you can try yourself
to see the effects of string literals denoting Unicode objects.

Clearly, a byte string type has to stay in the language. Not as
clearly, there might be a need for byte string literals. A PEP to this
effect was just withdrawn.

Regards,
Martin




More information about the Python-list mailing list