[Pythonmac-SIG] py2app questions..

Ronald Oussoren ronaldoussoren at mac.com
Thu Apr 11 10:47:36 CEST 2013


On 10 Apr, 2013, at 20:19, Chris Barker - NOAA Federal <chris.barker at noaa.gov> wrote:

> First: Ronald, thanks for keeping py2app up to date!
> 
> The good news: I just ran a recent py2app on an older app of mine, and
> it all worked out of the box.
> 
> However, the resulting bundle is HUGE. A lot of this is inevitable,
> I'm using a universal build, and some big packages, but there seems to
> be some room for improvement:
> 
> The app  in question is using:
> 
> wxPython
> numpy
> a tiny bit of scipy
> matplotlib
> 
> So it's going to be big!
> 
> Numpy, scipy and matplotlib all have a lot of intertwining modules,
> many of which get imported whether you need them or not, so probably
> the best one can do is include them all (more on that later)
> 
> However, I'm getting a couple packages that I have no idea why:
> 
> OpenGL

This I don't know.

> email

This will get better with py2app 0.8. In 0.7 the entire email package
gets included when some module has a dependency on a module from the
email package because email uses __import__ to provide compatibility aliases. 
In 0.8 I've taught modulegraph about those aliases and therefore only include
the bits of the email package that are actually used.

> zmq

I'v seen other reports that zmq gets pulled in without being used
(in the context of py2app build problems, some app wouldn't be packaged
correctly until pyzmq got removed). I haven't had time yet to see 
why zmq confuses py2app.

> 
> hand-removing them from the bundle doesn't break anything.

Py2app detects dependencies by looking for import statements in the
module bytecode. That doesn't mean the import is actually used (for example
because it is optionally imported on some other platform).

The --xref and --graph options for py2app can be used to emit a module
dependency graph that might show what's going on here.

> 
> I'm wondering if their recipes are getting triggered by accident
> somehow -- how can I tell how/why a particular package got included?
> 
> I'm also getting a number of *.so files that I don't understand --
> some of the larger ones are:
> 
> _bsddb.so
> _sqlite3.so
> _imaging.so  (is that PIL?)
> 
> there are a LOT more, but it's had to know what they are from just the
> names, and most are pretty small anyway  (except the wx ones that I
> need and expect to be big...)
> 
> I understand that it's a goal of py2app to work "out of the box"
> without needing to hand-tweak a lot to get it to work. This is a
> worthy goal, and the recipes approach works great. However, it would
> be nice if there was an alternate approach that made it easier to
> build a more optimized package. One idea:
> 
> Py2app (and all the other stand-alone builders I've looked at) figure
> out want to include from source code analysis. however, as it's pretty
> common for packages to "import" stuff that may not be used a
> particular app, you can get a lot of stuff you dontt need. For
> instance, I'm pretty sure that PIL import tkInter. These imports are
> often wrapped in an if clause or inside a function that may never get
> called, but source code analysis isn't going to find all of those, so
> the same thing to do is include it all -- resulting in bloated
> packages.

That's basicly correct. The recipe system does make it possible to tweak
the dependency graph, and that's (as an example) used in a recipe for
pydoc to remove the import dependency from pydoc on Tkinter as you don't
want to pull in Tkinter unless it is used by other parts of the app.
 
> 
> I propose an alternative -- analysis of the app at run-time: The user
> would run the app (maybe a test suite), then we'd take a look at
> sys.modules and see everything that was actually loaded. This would
> miss anything that wasn't exercised by the code you ran, but for most
> cases, a test suite would bring in everything (comprehensive test
> suites are hard to come by, but all it would need to do is test enough
> to import every package it might use -- that's not a heavy lift).
> 
> This seems so easy -- am I missing something???

Writing support code for this in py2app is easy enough, but that does
require having a comprehensive enough test suite for apps and writing
those is much harder (especially for GUI apps, which is where py2app is
used the most)

> 
> To support this, py2app would need a way to bypass the source code
> analysis, and instead load a data file with the list of modules that
> need to be included. Actually, as it sometimes takes a while to scan
> teh code, it would be nice of py2app could optinally dump the results
> of teh source code analysis, and be abel to re-load it later, rather
> than needed to re-run.
> 
> 1) Am I missing something as to why the run-time analysis wouldn't work ?
> 
> 2) how hard would it be to patch py2app to load a module list, rather
> than scan the source?

Fairly easy, py2app uses modulegraph to build a module dependency graph
and then extracts the list of modules and extensions from that. 

But, I'm not sure it that would be useful functionality. IMHO it would
be much nicer to have a tool for inspecting and managing the module
dependency graph (that is, use the --graph option of py2app to dump
the dependency graph, then have a (GUI) tool for inspecting that and
generating py2app configuration options that tweak the graph.

I'd also like to see an improved and enhanced set of recipes that 
automate this for commonly used packages (for some definition of
commonly used, adding recipes for popular bioinformatics packages 
would be fine, even if bioinformatics packages won't be commonly
used in the larger Python community)

My near term goals for py2app are:

* get the next 0.7 release out

* write some documentation, the current documentation is barely
  worth that description and offers little to no help on actually
  using py2app beyond the basics.

* refactor the py2app code base to be less dependent on 
  distutils, mostly to make it easier to create unittests but
  also to prepare for a future where distutils will be used
  less and less to package/build software.


Ronald



More information about the Pythonmac-SIG mailing list