[py-dev] thoughts on embedding files

Tue Jan 25 19:43:36 CET 2005

(cross-posting this to both the Schevo and Py mailing lists, to get the 
input of both sets of people)

Extending evo
=============

The next task I am embarking upon for the Schevo project is to add two 
new actions to evo, a tool that was just added that is a central place 
to perform actions to create, run, and generally work with Schevo 
applications:

* "evo py2exe", which will perform all necessary steps to turn your 
application into a Windows executable using Py2exe.

* "evo innosetup", which will perform the py2exe action, then wrap the 
executable inside a Windows installer using the free InnoSetup program.

(see 
http://lists.orbtech.com/pipermail/schevo-devel/2005-January/000060.html 
for more information)

These actions will be built to satisfy the requirements of an 
application that Patrick and I are working on professionally, but will 
be useful for packaging any Schevo application for deployment on Windows 
platforms.

In the future, expect to see "evo cxfreeze", and perhaps one day "evo 
deb" and "evo rpm".

Embedding files
---------------

One of the requirements of turning a Schevo application into a Py2exe 
application is that of file embedding.

If you have looked at Schevo prior to the latest merge, you might have 
noticed that each application had its own "build.py" step.  This step 
did the following things:

* Embedded the contents of the schema directory in a single Python module.

* Determined the icons that the application used, and the icon 
collections that are made available by the application, and embedded 
those icons in a single Python module.

* For some apps, embedded arbitrary files that the user interface would 
use, such as additional graphics other than icons.

The reason all of this was necessary is that we wanted to make sure that 
  the entirety of the schema, and all of the icons an app would use, 
would be embedded within a Py2exe application's library ZIP file, and 
not just copied into plain files that could be easily modified by a user 
of the application.

The easy way to do this during development, of course, is to always 
build the generated Python modules, and to always use those modules. 
This worked in the short term, but made application startup during 
development time-consuming, and added much boilerplate necessary for 
creating new applications.

What we want to do instead is to make sure that when developing a Schevo 
application, the developer does NOT worry about how or when to embed 
files in this manner.  The developer should just be able to create the 
application, efficiently run it in a development state using normal 
files, and then run "evo py2exe" to create an executable.

The "evo py2exe" action should embody all of the logic needed to 
determine what needs to be embedded and where, and the final application 
that is generated should have the necessary logic to know that it should 
read file-like objects from the embedded Python modules rather than real 
files on disk.

So, it becomes necessary to create an API to allow this transparency to 
occur, so that the application developer must only make minimal changes 
to his/her way of thinking but may still take advantage of files 
embedded within Python modules.

The py.path API
---------------

Early on in the Schevo project after it was split from Pypersyst, we 
made the decision to use py.test from the "py" library to run our unit 
tests.

We were happy with this decision, but this meant that the user would 
need to keep up-to-date with the py library, which is under heavy 
development.  So we kept our usage of py within Schevo optional, and 
minimal.

Now that we are including py within Schevo as a dependency, we can 
control which revision of it we use.  So if some major API shift occurs 
in the main py distribution, we can just stick with a "known good" 
revision for Schevo.

Because of this, we are moving from tapping the water lightly with our 
toes to walking into it knee deep.  :)

The py.path API is the first major step toward using py for more than 
just unit testing.  py.path provides a very well-designed object model 
for dealing with paths of all sorts, both local, remote, and virtual. 
I've started using it in Schevo whenever I refactor a piece of code that 
works with filesystem paths.

So, what I am thinking is that a py.path.embed package could be created 
that would embody the aspects of embedding files within a Python module, 
along with providing a single API that could be used to access those 
files both when they are still in the local filesystem, and when they 
have been embedded into Python modules.

I will be writing more as I discover more about py.path's innards, and 
determine whether this would be a good choice or not for embedding files.

Any feedback from the Py and Schevo teams is most appreciated :)