[Python-Dev] Re: PEP: Defining Unicode Literal Encodings (revision 1.1)

Tue Jul 17 23:57:40 EDT 2001

On Tue, 17 Jul 2001 09:33:43 -0700, Paul Prescod <paulp at ActiveState.com> wrote:

>David Eppstein wrote:
>> 
>>...
>> 
>> I tend to think putting anything at all in the text of the file is a hack
>> and not a clean design -- this is meta-information rather than content,
>> right?  
>
>What is the practical problem with mixing "meta-information" and
>"content"?
>
>> ... But maybe as long as you're going to do that you should go all the
>> way and have some kind of "meta" directive that allows MIME directives
>> beyond just charset=...
>
>What other ones make sense?

You could put the meta-information in another place, e.g.,
1. A file with the same name and extension .pym for python meta-info
    1.1 File could be a general preprocessing script or just data
2. A string in a directory with the file name (sans extension) as key.
    2.1 String could be execable, or flags for special process

You could flag files that have associated meta-info or scripts by
changing their extensions to .pyx for python with special effex ;-)

You could also have spam.pyx be empty, simply to trigger spam.pym,
which might see if its work was cached and otherwise torque spam.py
(usually, but could be anything) around any which way, including
substituting international keyword names or whatever, or maybe
doing loading and environment setup.

spam.pyx could also control decryption, decompression, uudecoding,
or mime-defined stuff, or whatever.

Old spam.py etc would be treated as now.

The downside of triggering automatic general purpose processing is
that it could be a door for an exploit, if not well protected.

At least keeping auxiliary information separate would get around the
directive-before-docstring problem.

--
BTW, I really don't like using file names and extensions as part
attribute encodings. I would much rather see a file system with
a standard n-sector data prefix that could be invariant irrespective
of container file name. There it could have an invariant utf-8 name
*for the content* independent of symlinks or renaming of files-as-containers,
as the data moved from system to system. File content type would also be
encoded in this prefix block, and the unix file command wouldn't have
so much guessing to do any more. There would also be room for an
assigned GUID to work like a UPC bar code identifying producer software
as desired. Enabling easy version compatibility checks, automatic searches
for conversion software for import/export, etc. Absolute time of first
creation, and original instance vs copy generation might also be encoded.

A user-modifiable string in this prefix block could carry the kind of
meta-info talked about above, so it could be associated with the file
content without relying on platform-specific (container) file conventions
such as extensions etc. You could hack access to this prefix block by
absolute negative seek as indices to the parts. (This is an idea, not
a detailed design ;-)

So yet another possible mechanism or two for
Defining Unicode Literal Encodings (OT ;-)