PEP: Defining Python Source Code Encodings

Tue Jul 17 15:52:01 EDT 2001

On Tue, 17 Jul 2001, Roman Suzi wrote:
> On Tue, 17 Jul 2001, M.-A. Lemburg wrote:
<...>
> > Yes, that could be an issue (I don't think it matters much though,
> > since parsing usually only done during byte-code compilation and
> > the results are buffered in .pyc files).
>
> No! Scripts are compiled each time they run.
> If this will be implemented, developers will need to do the trick
> of making each script a module and so on. This is not good idea.
>
> There clearly must be the way to prevent encode-decode. And it would be
> better if only EXPLICITLY given encoding will trigger encode-decode
> mechanism.

Python currently manages .pyc and .pyo files by placing them in the
same location as the .py they are generated from, only if the location
is not writeable or the program has changed will it get recompiled...

Why not extend this to allow Python to manage .py-s, and allow for set
locations for source and byte-code; a command option can be used to
get the old behaviour where .pyc|o live where the source does.

<...>
> And this is right. I even think encoding information could be EXTERNAL.
> For example, directory will need to have "__encodings__.py" file where
> encodings are listed.
>
> Then, Python if started with some key could check for such file and
> compile modules accordingly.

...or reformat them...

> If there is not __encodings__.py, then Python proceed as it does now,
> WITHOUT any conversions to and from.

...why restrict this paradigm to character encodings.

> This will make script-writer happy (no conversion overhead) and those who
> want to write encoding-enabled programs (they could specify
> __encodings__.py)

Do you mean: use "import __encodings__" in a program, have an
__encodings__.py in a modules subdir (as is with __init__.py), or just
on sys.path?

Allowing for some kind of system and per-user customization seems
prudent; a __magic__ file both feels natural (Pythonic) and could
allow for system, program, and per-user settings or tweaks...

> I think, this solves most problems.

...but it would be in addition to a `trigger' (like magic comments),
   right?  No trigger, no overhead.

<...>
> I have a feeling that PEP is solving non-problem. Or have I lost
> the thread?

Who ever said a PEP must solve a problem.

> For example, I usually write scripts where I assume "koi8-r" or
> "windows-1251" encodings. And the only problems I have is when I use
> "koi8-r" strings from modules when I need "1251" ones. (In this case I
> explicitely recode).
>
> Aren't encodings better confined in documents (in XML for examples),
> than programs?

I agree, this is document meta-stuff and should be handled as far away
from the language as possible.  However, Python does need to look at
documents at some point, so some way of transparently handling
different encodings (and why not formats) is needed.

> If programs need to be written in unicode, then isn't the next step
> o allow embedding sound, graphics and video?
> (I imagine video docstring ;-)

cool...

# ## format videodocstring

> I can admit that using utf-8 in writing programs could be justified,
> but Unicode... It will bring nightmare...

...if handled badly.

- Bruce

p.s.

Q: How do you handle literate, unicoded source with video docstrings?
A: .py-s are assumed to be usable without pre-processing, and only
   the bit of Python that handles the pre-processing determines when
   a source is actually a source.py.

e.g.,  doing "python source", when `source' contains:

	# ## format noweb
	# ## format videodocstring
	# ## encoding utf-8

would result in Python running `source' through the noweb filter to
separate the code from the docs, then through the videodocstring
filter to split off the fancy docstrings, and finally doing whatever
it needs to do to handle the non-standard encoding.  The result would
be called `source.py' and be stored <somewhere>; the next time Python
gets handed `source' it can check for <somewhere>/source.py, and
eschew the conversion steps if they have already been done.