Python's 8-bit cleanness deprecated?

Bengt Richter bokr at oz.net
Tue Feb 4 11:59:15 EST 2003


On Mon, 3 Feb 2003 14:51:52 -0600, Skip Montanaro <skip at pobox.com> wrote:

>
>    >> # -*- coding: Latin-1 -*-
>
>    Paul> What is this nonsense?  The interpreter is reading comment text
>    Paul> now?  Yucch!
>
>Given that most operating systems don't have files with data forks and
>resource forks, how would you tell the lexical analyzer what the encoding of
>a particular file is?
Other files? Specify it in __init__.py files governing the associated directory
or specifically identified files? Look for matching files with a special extension,
like .pif files under windows for old DOS executables? Or config files with
inference rules and/or info on specifically designated files or directories?
Inference rules keyed to file extensions, letting people tag specially encoded
files as they wish? Virtualize the Python file name space and have virtual mount
points for real directories, and then base encoding inferences on virtual locations?

One thing that bothers me about passing info in comments is that it implies
a grammar for part of the source (comments) which affects on the result of
interpreting the source, but is not (AFAIK 2.2.2) documented as part of the language
grammar or the source. Of course the #! first line similarly uses comment text,
so we are basically already living with an OS file system usage hack for carrying
non-data info associated with the data of a file. <idearrhea warning>I wonder
how long it will be before we have a portable file system that has packet structure
defaulting to info packet followed by data packet, and selecting packets as an extra
seek parameter defaulting to data. Then you could have a convention of passing data
encoding expressed in utf-8 in the info packet</idearrhea warning>.

Regards,
Bengt Richter




More information about the Python-list mailing list