crossplatform py2exe - would it be useful?

Bengt Richter bokr at oz.net
Fri Aug 8 14:27:06 EDT 2003


On Fri, 08 Aug 2003 17:31:35 +0200, Thomas Heller <theller at python.net> wrote:

>Alex Martelli <aleax at aleax.it> writes:
>
>> Oren Tirosh wrote:
>>    ...
>>>> > Sounds like a good idea to me, if a sensible name is chosen for the
>>>> > "main module" (I propose 'main':-).
>>>> 
>>>> My choice would have been __main__ :-) Is it really the correct way to
>>>> 'import __main__' instead of 'running' it?
>>> 
>>> You can't import __main__ - you'll get the one already in sys.modules.
>>
>> If there's a __main__ in the zip, we could remove (even assuming it's
>> already there) the yet-empty sys.modules['__main__'].
>>
>>> The code for the main script needs to be executed in __main__'s dict.
>
>Then we name the boot module __boot__ and import this from the zip.
>This could then execute the script in the __main__ module's namespace.
>
>> So, I reiterate an idea I've already expressed: key on the executable
>> file's name.  If it's at least six characters long and the first six
>> characters are (case-insensitive) 'p', 'y', 't', 'h', 'o', 'n' in this
>> order, forget the whole thing and proceed like now; in other words,
>> use whatever new tricks we insert *if and only if* the executable file's
>> name does NOT start with (case-insensitive) 'python'.
>>
>>> problems on some obscure environments. A possible alternative would be
>>> to have a configuration area inside the executable that can be modified
>>> by an external program (e.g. py2exe). The program would search for a
>>> signature string and modify the section after it. The configuration
>>> area can be as simple as a string that overrides the command line
>>> arguments.
>>
>> I suspect "obscure environments" may make it hard for py2exe to find
>> the needed signature and get at the 'configuration area' (depending on
>> how executable files are stored when seen as stream of bytes).  Still,
>> such an area would also be useful for other purposes, as you mention
>> (e.g., supplying the -O switch "at compile time", and the like).  So,
>> perhaps, we could simply test the executable's name FIRST, and if the
>> name starts with "python" just do nothing, otherwise look at the
>> configuration area (string) and so on.
>
>Sounds much like the way py2exe already works now.  It locates the
>appended zip-file by searching the exefile from the end, then finds the
>beginning of the zipfile, and looks for a magic number there, which is
>used to verify that the next n bytes before this position is a C
>structure containing the required flags.
>
>I don't like the idea to scan the executable for a magic signature
>without further hints where this should be.
>
>>  On any "obscure environment"
>> where the set of tricks doesn't work, one would simply have to avoid
>> renaming or copying the python interpreter to weird names, and otherwise
>> would be just about as well or badly off as today.
>
>From reading the McMillan installer sources some time ago, I have the
>impression that on some obscure platforms it's not possible to append
>the structure and the zipfile to the executable, and on other obscure
>platforms (or maybe runtime environments, maybe a cgi executable started
>from apache) it may be difficult to the pathname if the exefile.
>
>But, all in all, it sounds like a plan. Although I have the impression
>that it may be difficult to convince the python-dev crowd to include
>this in 2.3.1. (Is anyone of them reading this thread?)
>
PMJI, since I haven't read all the prior thread, but if the point is
just to have an executable that self-unpacks and starts, IWT the safest
(for platform independence -- not security, see my conclusion later below)
approach would be to have a tool that makes a light-weight self-unpacking
"exe" wrapper for any/each platform.

I think it would be relatively trivial on win32. I think you could do it all
in Python once you have a little C boilerplate exe template compiled and a
few special locations and segments defined (i.e., .exe's have provision
for simple embedded resource entities.

In the auto-executing python use, I guess the resources would be the interpreter
exe file, zip file(s), and some simple startup-control/config file.

For easiest cross platform use, I think I would use the win32 resource space
as a single binary resource string of bytes, and define a simple sequential
platform-independent packing *within* that, so that we don't get involved with
complex structure in the exe template or the corresponding unix or mac etc
solutions.

I have previously used a binary packing that pre- and -post-fixed  4-byte lengths for
each internal packet, with zero length being legal and 0xffffffxx being reserved
for special marks. The length *pre*-fix was optional (absence signalled by special mark)
so it was easy to write the whole packed segment sequentially without first knowing the sizes.
(BTW I used this to append segments stack-like to a file, so I could pop/top the file segment
with one physical read for small segments (just read the last 4k of a file and look back
in memory to the beginning per the last 4 bytes) (BTW2, ever wonder why a stack file is
not used to communicate between programs like pipe files?).

Anyway, even though you might not always need 2-way walking, I think it's not much space to allow
for it, or better, specify it in the overall header. It can make the file effectively into a doubly
linked list of segments with pointers in the form of relative offsets.

With this model of a multi-segment container of packets, we just need a header for the whole
thing. The first or any packet can be used by application convention for whatever, but there needs
to be a header for the whole thing before you can read packets. I.e., that says what the
byte ordering and total length is, at a minimum.

For that purpose, I've been thinking a unicode utf-8 representation of an rfc2822 header might be cool.

The unpacking wrapper exe could understand a simple command line switch to dump that, and we'd
have a file-name-independent way of identifying content to any desired detail. And the output could
easily be piped to a unicode app that could display chinese or japanese or russian or greek parts
of the header, though I think default should be English unless otherwise marked.

This could easily be used to specify what python interpreter was contained, and what apps or data.
Note that this kind of wrapper could be used just to install simple data files as well.

The only thing I don't like about this, and I REALLY REALLY don't like it, is having people be used to
executing stuff they might not check the sizes and signatures of. It is really going down the
slippery path of MS-ian uber-convenience, and ignoring security IMO.

I was getting fairly enthused about the exe, but I think it's really not a good idea. Why not
just let the wrapper part I described be a very light weight standalone installer/launcher/header-viewer
and keep the packed info resource as a separate file. Note that this separate file would now
start with a utf-8 rfc2822 header!! (and if we put Ctrl-z and Ctrl-D at the end of the header, many
editors would be able to open the file and just see the header).

I have been thinking this kind of header would do a heck of a lot more than magic numbers, but
for magic compatibility, maybe some utf-8 character could be found that would make a clean magic
prefix and identifier for utf-8 rfc2822. The header could also specify whether the following was
as single packet without length-links or had them pre- or post- or both.

A prototype (un)wrapper for utf-8-rfc-2822-headed packet files as outlined above would be short
work in Python, and UFHwrap.py (UniversalFileHeader wrap/unwrap) could be part of the batteries.

A standalone C version of the unwrapper (you wouldn't need a wrapper, since you could unwrap a
python version, though I don't think it would be very hard) would also be short work, especially
after a python prototype was working. Since the UFH file (extension hint .ufh ;-) is so simple
and platform-independent in structure, it should be easy to write uhfunwrap.c for most platforms,
and let people download a 50k exe they can trust for repeated use, rather than worry each time
that some big exe is going to zap them unless they fuss about signatures etc.

Is this PEP-able?

Regards,
Bengt Richter




More information about the Python-list mailing list